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Preface 


It was our privilege to serve as the program chairs for CAV 2018, the 30th International 
Conference on Computer-Aided Verification. CAV is an annual conference dedicated 
to the advancement of the theory and practice of computer-aided formal analysis 
methods for hardware and software systems. CAV 2018 was held in Oxford, UK, July 
14-17, 2018, with the tutorials day on July 13. 

This year, CAV was held as part of the Federated Logic Conference (FLoC) event 
and was collocated with many other conferences in logic. The primary focus of CAV is 
to spur advances in hardware and software verification while expanding to new 
domains such as learning, autonomous systems, and computer security. CAV is at the 
cutting edge of research in formal methods, as reflected in this year’s program. 

CAV 2018 covered a wide spectrum of subjects, from theoretical results to concrete 
applications, including papers on application of formal methods in large-scale industrial 
settings. It has always been one of the primary interests of CAV to include papers that 
describe practical verification tools and solutions and techniques that ensure a high 
practical appeal of the results. The proceedings of the conference are published in 
Springer’s Lecture Notes in Computer Science series. A selection of papers were 
invited to a special issue of Formal Methods in System Design and the Journal of the 
ACM. 

This is the first year that the CAV proceedings are published under an Open Access 
license, thus giving access to CAV proceedings to a broad audience. We hope that this 
decision will increase the scope of practical applications of formal methods and will 
attract even more interest from industry. 

CAV received a very high number of submissions this year—215 overall—Tresulting 
in a highly competitive selection process. We accepted 13 tool papers and 52 regular 
papers, which amounts to an acceptance rate of roughly 3096 (for both regular papers 
and tool papers). The high number of excellent submissions in combination with the 
scheduling constraints of FLoC forced us to reduce the length of the talks to 15 
minutes, giving equal exposure and weight to regular papers and tool papers. 

The accepted papers cover a wide range of topics and techniques, from algorithmic 
and logical foundations of verification to practical applications in distributed, net- 
worked, cyber-physical, and autonomous systems. Other notable topics are synthesis, 
learning, security, and concurrency in the context of formal methods. The proceedings 
are organized according to the sessions in the conference. 

The program featured two invited talks by Eran Yahav (Technion), on using deep 
learning for programming, and by Somesh Jha (University of Wisconsin Madison) on 
adversarial deep learning. The invited talks this year reflect the growing interest of the 
CAV community in deep learning and its connection to formal methods. The tutorial 
day of CAV featured two invited tutorials, by Shaz Qadeer on verification of con- 
current programs and by Matteo Maffei on static analysis of smart contracts. The 
subjects of the tutorials reflect the increasing volume of research on verification of 
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concurrent software and, as of recently, the question of correctness of smart contracts. 
As every year, one of the winners of the CAV award also contributed a presentation. 
The tutorial day featured a workshop in memoriam of Mike Gordon, titled “Three 
Research Vignettes in Memory of Mike Gordon,” organized by Tom Melham and 
jointly supported by CAV and ITP communities. 

Moreover, we continued the tradition of organizing a LogicLounge. Initiated by the 
late Helmut Veith at the Vienna Summer of Logic 2014, the LogicLounge is a series of 
discussions on computer science topics targeting a general audience and has become a 
regular highlight at CAV. This year’s LogicLounge took place at the Oxford Union and 
was on the topic of “Ethics and Morality of Robotics,” moderated by Judy Wajcman 
and featuring a panel of experts on the topic: Luciano Floridi, Ben Kuipers, Francesca 
Rossi, Matthias Scheutz, Sandra Wachter, and Jeannette Wing. We thank May Chan, 
Katherine Fletcher, and Marta Kwiatkowska for organizing this event, and the Vienna 
Center of Logic and Algorithms for their support. 

In addition, CAV attendees enjoyed a number of FLoC plenary talks and events 
targeting the broad FLoC community. 

In addition to the main conference, CAV hosted the Verification Mentoring 
Workshop for junior scientists entering the field and a high number of pre- and 
post-conference technical workshops: the Workshop on Formal Reasoning in Dis- 
tributed Algorithms (FRIDA), the workshop on Runtime Verification for Rigorous 
Systems Engineering (RV4RISE), the 5th Workshop on Horn Clauses for Verification 
and Synthesis (HCVS), the 7th Workshop on Synthesis (SYNT), the First International 
Workshop on Parallel Logical Reasoning (PLR), the 10th Working Conference on 
Verified Software: Theories, Tools and Experiments (VSTTE), the Workshop on 
Machine Learning for Programming (MLP), the 11th International Workshop on 
Numerical Software Verification (NSV), the Workshop on Verification of Engineered 
Molecular Devices and Programs (VEMDP), the Third Workshop on Fun With Formal 
Methods (FWFM), the Workshop on Robots, Morality, and Trust through the Verifi- 
cation Lens, and the IFAC Conference on Analysis and Design of Hybrid Systems 
(ADHS). 

The Program Committee (PC) for CAV consisted of 80 members; we kept the 
number large to ensure each PC member would have a reasonable number of papers to 
review and be able to provide thorough reviews. As the review process for CAV is 
double-blind, we kept the number of external reviewers to a minimum, to avoid 
accidental disclosures and conflicts of interest. Altogether, the reviewers drafted over 
860 reviews and made an enormous effort to ensure a high-quality program. Following 
the tradition of CAV in recent years, the artifact evaluation was mandatory for tool 
submissions and optional but encouraged for regular submissions. We used an Artifact 
Evaluation Committee of 25 members. Our goal for artifact evaluation was to provide 
friendly “beta-testing” to tool developers; we recognize that developing a stable tool on 
a cutting-edge research topic is certainly not easy and we hope the constructive 
comments provided by the Artifact Evaluation Committee (AEC) were of help to the 
developers. As a result of the evaluation, the AEC accepted 25 of 31 artifacts 
accompanying regular papers; moreover, all 13 accepted tool papers passed the eval- 
uation. We are grateful to the reviewers for their outstanding efforts in making sure 
each paper was fairly assessed. We would like to thank our artifact evaluation chair, 
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Igor Konnov, and the AEC for evaluating all artifacts submitted with tool papers as 
well as optional artifacts submitted with regular papers. 

Of course, without the tremendous effort put into the review process by our PC 
members this conference would not have been possible. We would like to thank the PC 
members for their effort and thorough reviews. 

We would like to thank the FLoC chairs, Moshe Vardi, Daniel Kroening, and Marta 
Kwiatkowska, for the support provided, Thanh Hai Tran for maintaining the CAV 
website, and the always helpful Steering Committee members Orna Grumberg, Aarti 
Gupta, Daniel Kroening, and Kenneth McMillan. Finally, we would like to thank the 
team at the University of Oxford, who took care of the administration and organization 
of FLoC, thus making our jobs as CAV chairs much easier. 
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Abstract. Fueled by massive amounts of data, models produced by 
machine-learning (ML) algorithms, especially deep neural networks, are 
being used in diverse domains where trustworthiness is a concern, includ- 
ing automotive systems, finance, health care, natural language process- 
ing, and malware detection. Of particular concern is the use of ML algo- 
rithms in cyber-physical systems (CPS), such as self-driving cars and 
aviation, where an adversary can cause serious consequences. 

However, existing approaches to generating adversarial examples and 
devising robust ML algorithms mostly ignore the semantics and con- 
text of the overall system containing the ML component. For example, 
in an autonomous vehicle using deep learning for perception, not every 
adversarial example for the neural network might lead to a harmful con- 
sequence. Moreover, one may want to prioritize the search for adversarial 
examples towards those that significantly modify the desired semantics 
of the overall system. Along the same lines, existing algorithms for con- 
structing robust ML algorithms ignore the specification of the overall 
system. In this paper, we argue that the semantics and specification of 
the overall system has a crucial role to play in this line of research. We 
present preliminary research results that support this claim. 


1 Introduction 


Machine learning (ML) algorithms, fueled by massive amounts of data, are 
increasingly being utilized in several domains, including healthcare, finance, and 
transportation. Models produced by ML algorithms, especially deep neural net- 
works (DNNs), are being deployed in domains where trustworthiness is a big 
concern, such as automotive systems [35], finance [25], health care [2], computer 
vision [28], speech recognition [17], natural language processing [38], and cyber- 
security [8,42]. Of particular concern is the use of ML (including deep learning) in 
cyber-physical systems (CPS) [29], where the presence of an adversary can cause 
serious consequences. For example, much of the technology behind autonomous 
and driver-less vehicle development is “powered” by machine learning [4,14]. 
DNNs have also been used in airborne collision avoidance systems for unmanned 
aircraft (ACAS Xu) [22]. However, in designing and deploying these algorithms 
in critical cyber-physical systems, the presence of an active adversary is often 
ignored. 
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Adversarial machine learning (AML) is a field concerned with the analysis of 
ML algorithms to adversarial attacks, and the use of such analysis in making ML 
algorithms robust to attacks. It is part of the broader agenda for safe and verified 
ML-based systems [39,41]. In this paper, we first give a brief survey of the field 
of AML, with a particular focus on deep learning. We focus mainly on attacks on 
outputs or models that are produced by ML algorithms that occur after training 
or “external attacks”, which are especially relevant to cyber-physical systems 
(e.g., for a driverless car the ML algorithm used for navigation has been already 
trained by the manufacturer once the “car is on the road”). These attacks are 
more realistic and are distinct from other type of attacks on ML models, such 
as attacks that poison the training data (see the paper [18] for a survey of such 
attacks). We survey attacks caused by adversarial examples, which are inputs 
crafted by adding small, often imperceptible, perturbations to force a trained 
ML model to misclassify. 

We contend that the work on adversarial ML, while important and useful, 
is not enough. In particular, we advocate for the increased use of semantics in 
adversarial analysis and design of ML algorithms. Semantic adversarial learn- 
ing explores a space of semantic modifications to the data, uses system-level 
semantic specifications in the analysis, utilizes semantic adversarial examples in 
training, and produces not just output labels but also additional semantic infor- 
mation. Focusing on deep learning, we explore these ideas and provide initial 
experimental data to support them. 


Roadmap. Section 2 provides the relevant background. A brief survey of adver- 
sarial analysis is given in Sect. 3. Our proposal for semantic adversarial learning 
is given in Sect. 4. 


2 Background 


Background on Machine Learning. Next we describe some general concepts 
in machine learning (ML). We will consider the supervised learning setting. 
Consider a sample space Z of the form X x Y, and an ordered training set 
S = ((xi,yi))4 (v; is the data and y; is the corresponding label). Let H be 
a hypothesis space (e.g., weights corresponding to a logistic-regression model). 
There is a loss function £: H x Z — R so that given a hypothesis w € H and a 
sample (x, y) € Z, we obtain a loss f(w, (a, y)). We consider the case where we 
want to minimize the loss over the training set S, 


Ls(w) = — Sew, (9) + XRQO. 


In the equation given above, \ > 0 and the term R(w) is called the regularizer 
and enforces “simplicity” in w. Since S is fixed, we sometimes denote ¢;(w) = 
€(w, (zi, yi)) as a function only of w. We wish to find a w that minimizes Ls(w) 
or we wish to solve the following optimization problem: 


min Ls(w) 
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Example: We will consider the example of logistic regression. In this case X = 
R^, Y = (1, -1}, H = R”, and the loss function £(w, (x, y)) is as follows (- 
represents the dot product of two vectors): 


log (1-46-2007) 


If we use the Lə regularizer (i.e. R(w) = ||w||5), then Ls(w) becomes: 


1 m 
— Y log (1+ e729) c ull; 
m 


i=l 


Stochastic Gradient Descent. Stochastic Gradient Descent (SGD) is a pop- 
ular method for solving optimization tasks (such as the optimization problem 
minyeg Ls(w) we considered before). In a nutshell, SGD performs a series of 
updates where each update is a gradient descent update with respect to a small 
set of points sampled from the training set. Specifically, suppose that we perform 
SGD T times. There are two typical forms of SGD: in the first form, which we 
call Sample-SGD, we uniformly and randomly sample i, ~ [m] at time t, and 
perform a gradient descent based on the i-th sample (£i, yi, ): 


Wii = Ge, m (We) = wi — m (wx) (1) 


where w is the hypothesis at time t, 7, is a parameter called the learning rate, 
and £; (w+) denotes the derivative of £;, (w) evaluated at w+. We will denote Ge, m 
as G. In the second form, which we call Perm-SGD, we first perform a random 
permutation of S, and then apply Eq. 1 T times by cycling through S according 
to the order of the permutation. The process of SGD can be summarized as a 
diagram: 


Gi Geo G4 Ga Gr 
Wo w1 eee Wt eee WT 


Classifiers. The output of the learning algorithm gives us a classifier, which is 
a function from R” to C, where R denotes the set of reals and C is the set of class 
labels. To emphasize that a classifier depends on a hypothesis w € H, which is 
the output of the learning algorithm described earlier, we will write it as Fẹ (if 
w is clear from the context, we will sometimes simply write F). For example, 
after training in the case of logistic regression we obtain a function from R” 
to {—1, +1}. Vectors will be denoted in boldface, and the r-th component of a 
vector x is denoted by x[r]. 

Throughout the paper, we refer to the function s(F;,) as the softmax layer 
corresponding to the classifier £,. In the case of logistic regression, s(F,,)(x) is 
the following tuple (the first element is the probability of —1 and the second one 
is the probability of +1): 


1 1 
1-4 eux 1 p ecuTx 


( ) 
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Formally, let c = |C| and Fuy be a classifier, we let s(F;,) be the function that 
maps R” to RẸ such that ||s(Fy,)(x)||1 = 1 for any x (i.e., s(F,) computes a 
probability vector). We denote s(F,,)(x)|l] to be the probability of s(Fw)(x) at 
label J. Recall that the softmax function from IR^ to a probability distribution 
over {1,--- , k) = [k] such that the probability of j € [k] for a vector x € R* is 


exl 
2 exi] 


Some classifiers F(x) are of the form arg max; s( F,,)(x)|l] (i.e., the classifier 
Fy outputs the label with the maximum probability according to the “softmax 
layer"). For example, in several deep-neural network (DNN) architectures the 
last layer is the softmaz layer. We are assuming that the reader is a familiar with 
basics of deep-neural networks (DNNs). For readers not familiar with DNNs we 
can refer to the excellent book by Goodfellow et al. [15]. 


Background on Logic. Temporal logics are commonly used for specifying 
desired and undesired properties of systems. For cyber-physical systems, it is 
common to use temporal logics that can specify properties of real-valued signals 
over real time, such as signal temporal logic (STL) [30] or metric temporal logic 
(MTL) [27]. 

A signal is a function s : D — S, with D C Rso an interval and either S C B 
or S CR, where B = {T, L} and R is the set of reals. Signals defined on B are 
called booleans, while those on R are said real-valued. A trace w = {81,...,5n} 
is a finite set of real-valued signals defined over the same interval D. We use 
variables z; to denote the value of a real-valued signal at a particular time 
instant. 

Let X = {o1,...,0%} be a finite set of predicates o; : R” — B, with o; = 
pi(zy,...,2,) 40, < € {<,<}, and p; : R” — R a function in the variables 
21,...,€4. An STL formula is defined by the following grammar: 


Q:—o0|^v|e^v|eUr v (2) 


where ø € X is a predicate and J C R>o is a closed non-singular interval. Other 
common temporal operators can be defined as syntactic abbreviations in the 
usual way, like for instance yi V p2 := 7(791 A p2), Fr p:= T Ur o, or Gr p= 
—Fr ^q. Given at € Ro, a shifted interval J is defined as t+J = (t4-t' | t € J}. 
The qualitative (or Boolean) semantics of STL is given in the usual way: 


Definition 1 (Qualitative semantics). Let w be a trace, t € Rso, and q be 
an STL formula. The qualitative semantics of p is inductively defined as follows: 


w,t E c iff o(w(t)) is true 
w,t E ~g iff w,t E p 


w,t mE ^ Q2 iff w,t r v1 and w,t r- v2 
w,t E pry iff dt € tI s.t. w,t' E v» and Vt" € [tt],w, t" E pı 
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A trace w satisfies a formula y if and only if w,0 = y, in short w H q. 
STL also admits a quantitative or robust semantics, which we omit for brevity. 
This provides quantitative information on the formula, telling how strongly the 
specification is satisfied or violated for a given trace. 


3 Attacks 


There are several types of attacks on ML algorithms. For excellent material on 
various attacks on ML algorithms we refer the reader to [3,18]. For example, in 
training time attacks an adversary wishes to poison a data set so that a *bad" 
hypothesis is learned by an ML-algorithm. This attack can be modeled as a game 
between the algorithm M L and an adversary A as follows: 


— ML picks an ordered training set S = ((xi, yi))1*4 
— A picks an ordered training set S = ((2;,;));.,, where r is | em]. 
— ML learns on S U S by essentially minimizing 


min Ls, g(w). 

'The attacker wants to maximize the above quantity and thus chooses S such 
that minyeg La, g(w) is maximized. For a recent paper on certified defenses for 
such attacks we refer the reader to [44]. In model extraction attacks an adversary 
with black-box access to a classifier, but no prior knowledge of the parameters 
of a ML algorithm or training data, aims to duplicate the functionality of (i.e., 
steal) the classifier by querying it on well chosen data points. For an example, 
model-extraction attacks see [45]. 

In this paper, we consider test-time attacks. We assume that the classifier 
F., has been trained without any interference from the attacker (i.e. no training 
time attacks). Roughly speaking, an attacker has an image x (e.g. an image of 
stop sign) and wants to craft a perturbation ô so that the label of x + ô is what 
the attacker desires (e.g. yield sign). The next sub-section describes test-time 
attacks in detail. We will sometimes refer to F;, as simply F, but the hypothesis 
w is lurking in the background (i.e., whenever we refer to w, it corresponds to 
the classifier F). 


3.1 Test-Time Attacks 


The adversarial goal is to take any input vector x € R” and produce a minimally 
altered version of x, adversarial sample denoted by x*, that has the property of 
being misclassified by a classifier F : R” — C. Formally speaking, an adversary 
wishes to solve the following optimization problem: 


mingesn p(ô) 
such that F(x + ô) € T 
=0 
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The various terms in the formulation are p is a metric on R”, T C C isa 
subset of the labels (the reader should think of T' as the target labels for the 
attacker), and M (called the mask) is a n-dimensional 0-1 vector of size n. 
The objective function minimizes the metric u on the perturbation ô. Next we 
describe various constraints in the formulation. 


- F(x+ô) ET 
The set T constrains the perturbed vector x+ ó! to have the label (according 
to F) in the set T. For mis-classification problems the label of x and x + ô 
are different, so we have T = C — (F(x)). For targeted mis-classification we 
have T = {t} (for t € C), where t is the target that an attacker wants (e.g., 
the attacker wants t to correspond to a yield sign). 

- 0. M-0 
The vector M can be considered as a mask (i.e., an attacker can only perturb 
a dimension i if M[;] = 0), i.e., if M[i] = 1 then ó[i] is forced to be 0. 
Essentially the attacker can only perturb dimension i if the i-th component 
of M is 0, which means that 6 lies in k-dimensional space where k is the 
number of non-zero entries in A. This constraint is important if an attacker 
wants to target a certain area of the image (e.g., glasses of in a picture of 
person) to perturb. 

— Convexity 
Notice that even if the metric p is convex (e.g., u is the La norm), because of 
the constraint involving F, the optimization problem is not convez (the con- 
straint ô- M. = 0 is convex). In general, solving convex optimization problems 
is more tractable non-convex optimization [34]. 


Note that the constraint 6-M = 0 essentially constrains the vector to be in a 
lower-dimensional space and does add additional complexity to the optimization 
problem. Therefore, for the rest of the section we will ignore that constraint and 
work with the following formulation: 


minsege p(ô) 
such that F(x + ô) € T 


FGSM Mis-classification Attack - This algorithm is also known as the fast 
gradient sign method (FGSM) [16]. The adversary crafts an adversarial sample 
x* = x + ô for a given legitimate sample x by computing the following pertur- 
bation: 


6 = e sign(VxLr(x)) (4) 


The function Lp(x) is a shorthand for (w, x, l(x)), where w is the hypothesis 
corresponding to the classifier F, x is the data point and I(x) is the label of 
x (essentially we evaluate the loss function at the hypothesis corresponding to 
the classifier). The gradient of the function Lp is computed with respect to 


1 The vectors are added component wise. 
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x using sample x and label y = I(x) as inputs. Note that V&Lr(x) is an n- 
dimensional vector and sign(ViLr(x)) is a n-dimensional vector whose i-th 
element is the sign of the V Lr(x))[i]. The value of the input variation parameter 
€ factoring the sign matrix controls the perturbation's amplitude. Increasing its 
value increases the likelihood of x* being misclassified by the classifier F but on 
the contrary makes adversarial samples easier to detect by humans. The key idea 
is that FGSM takes a step in the direction of the gradient of the loss function 
and thus tries to maximize it. Recall that SGD takes a step in the direction that 
is opposite to the gradient of the loss function because it is trying to minimize 
the loss function. 


JSMA Targeted Mis-classification Attack - This algorithm is suitable for 
targeted misclassification [37]. We refer to this attack as JSMA throughout the 
rest of the paper. To craft the perturbation 6, components are sorted by decreas- 
ing adversarial saliency value. The adversarial saliency value S(x,t)[i] of com- 
ponent i for an adversarial target class t is defined as: 


o if SOHO <Q or y^ , MOVES > 9 
] Ox[i zt Ox[i 
seni -1 m j , 


x Os(F)[j](x z 
x pe mem otherwise 


(5) 
where matrix Jp — El . is the Jacobian matrix for the output of the 


softmax layer s(F)(x). Since *5,cc s(F)[k]|(x) = 1, we have the following equa- 
tion: 


ft 
The first case corresponds to the scenario if changing the i-th component of x 
takes us further away from the target label t. Intuitively, S(x, t) [i] indicates how 
likely is changing the i-th component of x going to “move towards” the target 
label t. Input components i are added to perturbation 6 in order of decreasing 
adversarial saliency value S(x, t)[i] until the resulting adversarial sample x* = 
x + ô achieves the target label t. The perturbation introduced for each selected 
input component can vary. Greater individual variations tend to reduce the 
number of components perturbed to achieve misclassification. 


CW Targeted Mis-classification Attack. The CW-attack [5] is widely 
believed to be one of the most “powerful” attacks. The reason is that CW cast 
their problem as an unconstrained optimization problem, and then use state-of- 
the art solver (i.e. Adam [24]). In other words, they leverage the advances in 
optimization for the purposes of generating adversarial examples. 

In their paper Carlini-Wagner consider a wide variety of formulations, but 
we present the one that performs best according to their evaluation. The opti- 
mization problem corresponding to CW is as follows: 


mingeRn uà) 
such that F(x 4-0) -t 
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CW use an existing solver (Adam [24]) and thus need to make sure that each 
component of x + 6 is between 0 and 1 (i.e. valid pixel values). Note that the 
other methods did not face this issue because they control the “internals” of the 
algorithm (i.e., CW used a solver in a “black box” manner). We introduce a new 
vector w whose i-th component is defined according to the following equation: 


1 
ôli] = z (tanh(w[i]) + 1) — x[i] 
Since —1 < tanh(w][i]) € 1, it follows that 0 < x[i] + ó[i] € 1. In terms of this 
new variable the optimization problem becomes: 


Next they approximate the constraint (F(x) — t) with the following func- 
tion: 


g(x) = max (max ZF) 69] - ZCG =) 
In the equation given above Z(F) is the input of the DNN to the softmax layer 
(i.e. s(F)(x) = softmax(Z(F)(x))) and & is a confidence parameter (higher « 
encourages the solver to find adversarial examples with higher confidence). The 
new optimization formulation is as follows: 


minwern u(4(tanh(w)+ 1) — x) 
such that g($(tanh(w) +1)) <0 


Next we incorporate the constraint into the objective function as follows: 
minwex u(z(tanh(w) + 1) — x) + c g($(tanh(w) + 1)) 


In the objective given above, the “Lagrangian variable” c > 0 is a suitably chosen 
constant (from the optimization literature we know that there exists c > 0 such 
that the optimal solutions of the last two formulations are the same). 


3.2 Adversarial Training 


Once an attacker finds an adversarial example, then the algorithm can be 
retrained using this example. Researchers have found that retraining the model 
with adversarial examples produces a more robust model. For this section, we 
will work with attack algorithms that have a target label t (ie. we are in the 
targeted mis-classification case, such as JSMA or CW). Let A(w,x,t) be the 
attack algorithm, where its inputs are as follows: w € H is the current hypothe- 
sis, x is the data point, and t € C is the target label. The output of A(w, x, t) is 
a perturbation ô such that F(x +ô) = t. If the attack algorithm is simply a mis- 
classification algorithm (e.g. FGSM or Deepfool) we will drop the last parameter 
t; 
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An adversarial training algorithm R.4(w, x, t) is parameterized by an attack 
algorithm A and outputs a new hypothesis w' € H. Adversarial training works 
by taking a datapoint x and an attack algorithm A(w, x, t) as its input and then 
retraining the model using a specially designed loss function (essentially one 
performs a single step of the SGD using the new loss function). The question 
arises: what loss function to use during the training? Different methods use 
different loss functions. 

Next, we discuss some adversarial training algorithms proposed in the lit- 
erature. At a high level, an important point is that the more sophisticated an 
adversarial perturbation algorithm is, harder it is to turn it into adversarial 
training. The reason is that it is hard to *encode" the adversarial perturbation 
algorithm as an objective function and optimize it. We will see this below, espe- 
cially for the virtual adversarial training (VAT) proposed by Miyato et al. [32]. 


Retraining for FGSM. We discussed the FGSM attack method earlier. In 
this case A = FGSM. The loss function used by the retraining algorithm 
Rerosm(w, x, t) is as follows: 


feasm(w, xi, yi) = ((w,xi, yi) + M (w, x; + FGSM(w, xi), yi) 


Recall that FGSM(w, x) was defined earlier, and A is a regularization parameter. 
The simplicity of FGSM(w,x;) allows taking its gradient, but this objective 
function requires label y; because we are reusing the same loss function £ used 
to train the original model. Further, FGSM(w, x;) may not be very good because 
it may not produce good adversarial perturbation direction (i.e. taking a bigger 
step in this direction might produce a distorted image). The retraining algorithm 
is simply as follows: take one step in the SGD using the loss function (gasw at 
the data point x;. 

A caveat is needed for taking gradient during the SGD step. At iteration t 
suppose we have model parameters w, and we need to compute the gradient of 
the objective. Note that FGSM(w, x) depends on w so by chain rule we need 
to compute OFGSM(w, x)/Ow|,—,,. However, this gradient is volatile?, and so 
instead Goodfellow et al. only compute: 


ot (w, Xi + FGSM(uy, Xi), yi) 
Ow 


w=wt 
Essentially they treat FGSM(w;, x;) as a constant while taking the derivative. 


Virtual Adversarial Training (VAT). Miyato et al. [32] observed the draw- 
back of requiring label y; for the adversarial example. Their intuition is that one 
wants the classifier to behave “similarly” on x and x4-ó, where ô is the adversarial 
perturbation. Specifically, the distance of the distribution corresponding to the 
output of the softmax layer Fu on x and x4 ó is small. VAT uses KullbackLeibler 


? [n general, second-order derivatives of a classifier corresponding to a DNN vanish at 
several points because several layers are piece-wise linear. 
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(KL) divergence as the measure of the distance between two distributions. Recall 
that KL divergence of two distributions P and @ over the same finite domain D 
is given by the following equation: 


KL(P, Q) = X` P(i)log Ga 
icD 


Therefore, they propose that, instead of reusing £, they propose to use the 
following for the regularizer, 


A(r,x, w) = KL (s(Fu)(x)ly], s(Fw)(x + r)[u]) 


for some r such that ||r|| < 6. As a result, the label y; is no longer required. The 
question is: what r to use? Miyato et al. [32] propose that in theory we should 
use the “best” one as 


E KL (s(Fw)(x)[y], s(Fu)(x + r)ly]) 


This thus gives rise to the following loss function to use during retraining: 


Évar(w, Xi, yi) = ew, Xi; yi) +X max Ar, Xi, w) 
riri <8 
However, one cannot easily compute the gradient for the regularizer. Hence the 
authors perform an approximation as follows: 


1. Compute the Taylor expansion of A(r,x;,w) at r = 0, so A(r,x;,w) = 
rT H (xi, w) r where H (xj, w) is the Hessian matrix of A(r, x;, w) with respect 
to r at r — Q0. 

2. Thus maxj,j«;5 A(r, xi,w) = maxi; (r7 H (xi, w) r). By variational char- 
acterization of the symmetric matrix (H (x;, w) is symmetric), r* = 60 where 
u = v(xi, w) is the unit eigenvector of H(x;,w) corresponding to its largest 
eigenvalue. Note that r* depends on x; and w. Therefore the loss function 
becomes: 


Lyar (9, xi, yi) = LO, xi, yi) + AA(r" , xi, w) 


3. Now suppose in the process of SGD we are at iteration t with model param- 
eters w;, and we need to compute Olyar/Ow|y=w,- By chain rule we need 
to compute Or* /Ow|,,—,,. However the authors find that such gradients are 
volatile, so they instead fix r* as a constant at the point 0,, and compute 


OKL (s(Fw)(x)[y], (Fw) (x + r)ly]) 
Ow 


w=wWt 


3.3 Black Box Attacks 


Recall that earlier attacks (e.g. FGSM and JSMA) needed white-box access to 
the classifier F (essentially because these attacks require first order information 
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about the classifier). In this section, we present black-box attacks. In this case, an 
attacker can only ask for the labels F(x) for certain data points. Our presentation 
is based on [36], but is more general. 

Let A(w,x,t) be the attack algorithm, where its inputs are: w € H is the 
current hypothesis, x is the data point, and t € C is the target label. The output 
of .A(w, x, t) is a perturbation ô such that F(x + 9) = t. If the attack algorithm 
is simply a mis-classification algorithm (e.g. FGSM or Deepfool) we will drop 
the last parameter t (recall that in this case the attack algorithm returns a ó 
such that F(x + 6) Z F(x)). An adversarial training algorithm RA(w, x,t) is 
parameterized by an attack algorithm A and outputs a new hypothesis w' € H 
(this was discussed in the previous subsection). 


Initialization: We pick a substitute classifier G and an initial seed data set So 
and train G. For simplicity, we will assume that the sample space Z — X x Y 
and the hypothesis space H for G is same as that of F (the classifier under 
attack). However, this is not crucial to the algorithm. We will call G the substitute 
classifier and F the target classifier. Let S = So be the initial data set, which 
will be updated as we iterate. 


Iteration: Run the attack algorithm A(w, x, t) on G and obtain a ô. If F(x+6) = 
t, then stop we are done. If F(x +6) = t but not equal to t, we augment the 
data set S as follows: 


S=SU(x+6,t) 


We now retrain G on this new data set, which essentially means running the 
SGD on the new data point (x + ô, t’). Notice that we can also use adversarial 
training & A(w, x,t) to update G (to our knowledge this has been not tried out 
in the literature). 


3.4 Defenses 


Defenses with formal guarantees against test-time attacks have proven elusive. 
For example, Carlini and Wagner [6] have a recent paper that breaks ten recent 
defense proposals. However, defenses that are based on robust-optimization 
objectives have demonstrated promise [26, 33, 43]. Several techniques for verifying 
properties of a DNN (in isolation) have appeared recently (e.g., [12,13,19,23]). 
Due to space limitations we will not give a detailed account of all these defenses. 


4 Semantic Adversarial Analysis and Training 


A central tenet of this paper is that the analysis of deep neural networks (and 
machine learning components, in general) must be more semantic. In particular, 
we advocate for the increased use of semantics in several aspects of adversarial 
analysis and training, including the following: 
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e Semantic Modification Space: Recall that the goal of adversarial attacks is to 
modify an input vector x with an adversarial modification 6 so as to achieve 
a target misclassification. Such modifications typically do not incorporate the 
application-level semantics or the context within which the neural network is 
deployed. We argue that it is essential to incorporate more application-level, 
contextual semantics into the modification space. Such semantic modifica- 
tions correspond to modifications that may arise more naturally within the 
context of the target application. We view this not as ignoring arbitrary mod- 
ifications (which are indeed worth considering with a security mind set), but 
as prioritizing the design and analysis of DNNs towards semantic adversarial 
modifications. Sect. 4.1 discusses this point in more detail. 

e System-Level Specifications: The goal of much of the work in adversarial 
attacks has been to generate misclassifications. However, not all misclassi- 
fications are made equal. We contend that it is important to find misclassifi- 
cations that lead to violations of desired properties of the system within which 
the DNN is used. Therefore, one must identify such system-level specifications 
and devise analysis methods to verify whether an erroneous behavior of the 
DNN component can lead to the violation of a system-level specification. 
System-level counterexamples can be valuable aids to repair and re-design 
machine learning models. See Sect. 4.1 for a more detailed discussion of this 
point. 

e Semantic (Re-) Training: Most machine learning models are trained with the 
main goal of reducing misclassifications as measured by a suitably crafted loss 
function. We contend that it is also important to train the model to avoid 
undesirable behaviors at the system level. For this, we advocate using methods 
for semantic training, where system-level specifications, counterexamples, and 
other artifacts are used to improve the semantic quality of the ML model. 
Sect. 4.2 explores a few ideas. 

e Confidence-Based Analysis and Decision Making: Deep neural networks (and 
other ML models) often produce not just an output label, but also an asso- 
ciated confidence level. We argue that confidence levels must be used within 
the design of ML-based systems. They provide a way of exposing more infor- 
mation from the DNN to the surrounding system that uses its decisions. Such 
confidence levels can also be useful to prioritize analysis towards cases that 
are more egregious failures of the DNN. More generally, any ezplanations and 
auxiliary information generated by the DNN that accompany its main output 
decisions can be valuable aids in their design and analysis. 


4.1 Compositional Falsification 


We discuss the problem of performing system-level analysis of a deep learning 
component, using recent work by the authors [9, 10] to illustrate the main points. 
The material in this section is mainly based on [40]. 

We begin with some basic notation. Let S denote the model of the full system 
S under verification, E denote a model of its environment, and @ denote the 
specification to be verified. C is an ML model (e.g. DNN) that is part of S. As 
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in Sect.3, let x be an input to C. We assume that @ is a trace property — a 
set of behaviors of the closed system obtained by composing S with E, denoted 
S||E. The goal of falsification is to find one or more counterexamples showing 
how the composite system S||E violates 9. In this context, semantic analysis 
of C is about finding a modification 6 from a space of semantic modifications A 
such that C, on x+ ĝ, produces a misclassification that causes S|| E to violate d. 


——— Environment 


Sensor Input 


—— Controller 


x j 
Learning-Based Perception 


Fig. 1. Automatic Emergency Braking System (AEBS) in closed loop. An image clas- 
sifier based on deep neural networks is used to perceive objects in the ego vehicle's 
frame of view. 


Example Problem. As an illustrative example, consider a simple model of an 
Automatic Emergency Braking System (AEBS), that attempts to detect objects 
in front of a vehicle and actuate the brakes when needed to avert a collision. 
Figure1 shows the AEBS as a system composed of a controller (automatic brak- 
ing), a plant (vehicle sub-system under control, including transmission), and an 
advanced sensor (camera along with an obstacle detector based on deep learn- 
ing). The AEBS, when combined with the vehicle's environment, forms a closed 
loop control system. The controller regulates the acceleration and braking of the 
plant using the velocity of the subject (ego) vehicle and the distance between it 
and an obstacle. The sensor used to detect the obstacle includes a camera along 
with an image classifier based on DNNs. In general, this sensor can provide noisy 
measurements due to incorrect image classifications which in turn can affect the 
correctness of the overall system. 

Suppose we want to verify whether the distance between the ego vehicle and 
a preceding obstacle is always larger than 2 m. In STL, this requirement ® can 
be written as Go,r(|Xego — Xobs||2 > 2). Such verification requires the exploration 
of a very large input space comprising of the control inputs (e.g., acceleration 
and braking pedal angles) and the machine learning (ML) component's feature 
space (e.g., all the possible pictures observable by the camera). The latter space 
is particularly large—for example, note that the feature space of RGB images of 
dimension 1000 x 600 px (for an image classifier) contains 2561000x600x3 elements. 

In the above example, S||E is the closed loop system in Fig. 1 where S com- 
prises the DNN and the controller, and E comprises everything else. C is the 
DNN used for object detection and classification. 
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This case study has been implemented in Matlab/Simulink? in two versions 
that use two different Convolutional Neural Networks (CNNs): the Caffe [20] 
version of AlexNet [28] and the Inception-v3 model created with Tensorflow [31], 
both trained on the ImageNet database [1]. Further details about this example 
can be obtained from [9]. 


Approach. A key idea in our approach is to have a system-level verifier that 
abstracts away the component C while verifying 9 on the resulting abstraction. 
This system-level verifier communicates with a component-level analyzer that 
searches for semantic modifications ó to the input x of C that could lead to 
violations of the system-level specification ®. Figure 2 illustrates this approach. 


Region of Uncertainty 
(projected) Ugoy© 


System S 
Env. E 
Property © 


System-Level 
Analysis 


Component 
(ML) Analysis 
Component-level errors 

(misclassifications) 


Correct / Incorrect (+ counterexamples) 


Fig. 2. Compositional verification approach. A system-level verifier cooperates with 
a component-level analysis procedure (e.g., adversarial analysis of a machine learning 
component to find misclassifications). 


We formalize this approach while trying to emphasize the intuition. Let T' 
denote the set of all possible traces of the composition of the system with its 
environment, S||E. Given a specification 9, let Ts denote the set of traces in 
T satisfying 9. Let Us denote the projection of these traces onto the state and 
interface variables of the environment E. Us is termed as the validity domain of 
9, i.e., the set of environment behaviors for which @ is satisfied. Similarly, the 
complement set U_@ is the set of environment behaviors for which @ is violated. 

Our approach works as follows: 


1. The System-level Verifier initially performs two analyses with two extreme 
abstractions of the ML component. First, it performs an optimistic analysis, 
wherein the ML component is assumed to be a “perfect classifier”, i.e., all 
feature vectors are correctly classified. In situations where ML is used for per- 
ception/sensing, this abstraction assumes perfect perception/sensing. Using 
this abstraction, we compute the validity domain for this abstract model of 
the system, denoted Uz . Next, it performs a pessimistic analysis where the 
ML component is abstracted by a “completely-wrong classifier”, i.e., all fea- 
ture vectors are misclassified. Denote the resulting validity domain as Ug. It 
is expected that U$ D Us. 


3 https://github.com/dreossi/analyzeNN. 
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Abstraction permits the System-level Verifier to operate on a lower- 
dimensional search space and identify a region in this space that may be 
affected by the malfunctioning of component C—a so-called “region of uncer- 
tainty” (ROU). This region, USo,, is computed as UZ V Ug. In other words, 
it comprises all environment behaviors that could lead to a system-level fail- 
ure when component C malfunctions. This region US ot projected onto the 
inputs of C, is communicated to the ML Analyzer. (Concretely, in the context 
of our example of Sect. 4.1, this corresponds to finding a subspace of images 
that corresponds to Ufo.) 

2. 'The Component-level Analyzer, also termed as a Machine Learning (ML) 
Analyzer, performs a detailed analysis of the projected ROU UE un A key 
aspect of the ML analyzer is to explore the semantic modification space effi- 
ciently. Several options are available for such an analysis, including the vari- 
ous adversarial analysis techniques surveyed earlier (applied to the semantic 
space), as well as systematic sampling methods [9]. Even though a component- 
level formal specification may not be available, each of these adversarial anal- 
yses has an implicit notion of “misclassification.” We will refer to these as 
component-level errors. The working of the ML analyzer from [9] is shown in 
Fig. 3. 

3. When the Component-level (ML) Analyzer finds component-level errors (e.g., 
those that trigger misclassifications of inputs whose labels are easily inferred), 
it communicates that information back to the System-level Verifier, which 
checks whether the ML misclassification can lead to a violation of the system- 
level property 9. If yes, we have found a system-level counterexample. If 
no component-level errors are found, and the system-level verification can 
prove the absence of counterexamples, then it can conclude that @ is satisfied. 
Otherwise, if the ML misclassification cannot be extended to a system-level 
counterexample, the ROU is updated and the revised ROU passed back to 
the Component-level Analyzer. 


The communication between the System-level Verifier and the Component-level 
(ML) Analyzer continues thus, until we either prove/disprove ®, or we run out 
of resources. 


Sample Results. We have applied the above approach to the problem of com- 
positional falsification of cyber-physical systems (CPS) with machine learning 
components [9]. For this class of CPS, including those with highly non-linear 
dynamics and even black-box components, simulation-based falsification of tem- 
poral logic properties is an approach that has proven effective in industrial prac- 
tice (e.g., [21,46]). We present here a sample of results on the AEBS example 
from [9], referring the reader to more detailed descriptions in the other papers 
on the topic [9, 10]. 

In Fig. 4 we show one result of our analysis for the Inception-v3 deep neural 
network. This figure shows both correctly classified and misclassified images on 
a range of synthesized images where (i) the environment vehicle is moved away 
from or towards the ego vehicle (along z-axis), (ii) it is moved sideways along 
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Fig.3. Machine Learning Analyzer: Searching the Semantic Modification Space. A 
concrete semantic modification space (top left) is mapped into a discrete abstract space. 
Systematic sampling, using low-discrepancy methods, yields points in the abstract 
space. These points are concretized and the NN is evaluated on them to ascertain if they 
are correctly or wrongly classified. The misclassifications are fed back for system-level 
analysis. 


the road (along x-axis), or (iii) the brightness of the image is modified. These 
modifications constitute the 3 axes of the figure. Our approach finds misclas- 
sifications that do not lead to system-level property violations and also mis- 
classifications that do lead to such violations. For example, Fig.4 shows two 
misclassified images, one with an environment vehicle that is too far away to be 
a safety hazard, as well as another image showing an environment vehicle driving 
slightly on the wrong side of the road, which is close enough to potentially cause 
a violation of the system-level safety property (of maintaining a safe distance 
from the ego vehicle). 

For further details about this and other results with our approach, we refer 
the reader to [9,10]. 


4.2 Semantic Training 


In this section we discuss two ideas for semantic training and retraining of deep 
neural networks. We first discuss the use of hinge loss as a way of incorporating 
confidence levels into the training process. Next, we discuss how system-level 
counterexamples and associated misclassifications can be used in the retraining 
process to both improve the accuracy of ML models and also to gain more assur- 
ance in the overall system containing the ML component. A more detailed study 
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brightness 
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Fig. 4. Misclassified images for Inception-v3 neural network (trained on ImageNet 
with TensorFlow). Red crosses are misclassified images and green circles are correctly 
classified. Our system-level analysis finds a corner-case image that could lead to a 
system-level safety violation. (Color figure online) 


of using misclassifications (ML component-level counterexamples) to improve 
the accuracy of the neural network is presented in [11]; this approach is termed 
counterezample-guided data augmentation, inspired by counterexample-guided 
abstraction refinement (CEGAR) [7] and similar paradigms. 


Experimental Setup. As in the preceding section, we consider an Automatic 
Emergency Braking System (AEBS) using a DNN-based object detector. How- 
ever, in these experiments we use an AEBS deployed within Udacity’s self-driving 
car simulator, as reported in our previous work [10].4 We modified the Udacity 
simulator to focus exclusively on braking. In our case studies, the car follows 
some predefined way-points, while accelerating and braking are controlled by 
the AEBS connected to a convolutional neural network (CNN). In particular, 
whenever the CNN detects an obstacle in the images provided by the onboard 
camera, the AEBS triggers a braking action that slows the vehicle down and 
avoids the collision against the obstacle. 

We designed and implemented a CNN to predict the presence of a cow on 
the road. Given an image taken by the onboard camera, the CNN classifies the 
picture in either “cow” or “not cow” category. The CNN architecture is shown 
in Fig. 5. It consists of eight layers: the first six are alternations of convolutions 
and max-pools with ReLU activations, the last two are a fully connected layer 
and a softmax that outputs the network prediction (confidence level for each 
label). 

We generated a data set of 1000 road images with and without cows. We 
split the data set into 80% training and 20% validation data. Our model was 
implemented and trained using the Tensorflow library with cross-entropy cost 
function and the Adam algorithm optimizer (learning rate 1074). The model 


4 Udacity’s self-driving car simulator: https://github.com/udacity /self-driving-car- 
sim. 
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Fig. 5. CNN architecture. 


Breach Car Testing 


Fig. 6. Udacity simulator with a CNN-based AEBS in action. 


reached 9596 accuracy on the test set. Finally, the resulting CNN is connected 
to the Unity simulator via Socket.IO protocol? Figure6 depicts a screenshot of 
the simulator with the AEBS in action in proximity of a cow. 


Hinge Loss. In this section, we investigate the relationship between multiclass 
hinge loss functions and adversarial examples. Hinge loss is defined as follows: 


I0) = max(0, k + max( i) — ài) © 
where (x,y) is a training sample, à = F(x) is a prediction, and l is the ground 
truth label of x. For this section, the output ĝ is a numerical value indicating the 
confidence level of the network for each class. For example, j can be the output 
of a softmax layer as described in Sect. 2. 


5 Socket.IO protocol: https:/ /github.com/socketio. 
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Consider what happens as we vary k. Suppose there is ani Æ l s.t. ji > fj. 
Pick the largest such i, call it i*. For k = 0, we will incur a loss of $;* — y for the 
example (z, y). However, as we make k more negative, we increase the tolerance 
for ^misclassifications" produced by the DNN F. Specifically, we incur no penalty 
for a misclassification as long as the associated confidence level deviates from 
that of the ground truth label by no more than |k|. Larger the absolute value of 
k, the greater the tolerance. Intuitively, this biases the training process towards 
avoiding “high confidence misclassifications". 

In this experiment, we investigate the role of k and explore different param- 
eter values. At training time, we want to minimize the mean hinge loss across 
all training samples. We trained the CNN described above with different val- 
ues of k and evaluated its precision on both the original test set and a set of 
counterexamples generated for the original model, i.e., the network trained with 
cross-entropy loss. 

'Table 1 reports accuracy and log loss for different values of k on both original 
and counterexamples test sets (Toriginat and Teountex, respectively). 


Table 1. Hinge loss with different k values. 


k Toriginal Teountez 


Acc | Log-loss | Acc | Log-loss 
0 0.69 | 0.68 0.11 | 0.70 
—0.01 | 0.77 | 0.69 0.00 | 0.70 
—0.05 | 0.52 | 0.70 0.67 | 0.69 
—0.1 | 0.50 | 0.70 0.89 | 0.68 
—0.25 | 0.51 | 0.70 0.77 | 0.68 


Table 1 shows interesting results. We note that a negative k increases the 
accuracy of the model on counterexamples. In other words, biasing the training 
process by penalizing high-confidence misclassifications improves accuracy on 
counterexamples! However, the price to pay is a reduction of accuracy on the 
original test set. This is still a very preliminary result and further experimenta- 
tion and analysis is necessary. 


System-Level Counterexamples. By using the composition falsification 
framework presented in Sect. 4.1, we identify orientations, displacements on the 
z-axis, and color of an obstacle that leads to a collision of the vehicle with the 
obstacle. Figure 7 depicts configurations of the obstacle that lead to specification 
violations, and hence, to collisions. 

In an experiment, we augment the original training set with the elements of 
Teountex, l.., images of the original test set Toriginal that are misclassified by 
the original model (see Sect. 4.2). 

We trained the model with both cross-entropy and hinge loss for 20 epochs. 
Both models achieve a high accuracy on the validation set (429296). However, 
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Fig. 7. Semantic counterexamples: obstacle configurations leading to property viola- 
tions (in red). (Color figure online) 


when plugged into the AEBS, neither of these models prevents the vehicle from 
colliding against the obstacle with an adversarial configuration. This seems to 
indicate that simply retraining with some semantic (system-level) counterexam- 
ples generated by analyzing the system containing the ML model may not be 
sufficient to eliminate all semantic counterexamples. 

Interestingly, though, it appears that in both cases the impact of the vehicle 
with the obstacle happens at a slower speed than the one with the original 
model. In other words, the AEBS system starts detecting the obstacle earlier 
than with the original model, and therefore starts braking earlier as well. This 
means that despite the specification violations, the counterexample retraining 
procedure seems to help with limiting the damage in case of a collision. Coupled 
with a run-time assurance framework (see [41]), semantic retraining could help 
mitigate the impact of misclassifications on the system-level behavior. 


5 Conclusion 


In this paper, we surveyed the field of adversarial machine learning with a spe- 
cial focus on deep learning and on test-time attacks. We then introduced the 
idea of semantic adversarial machine (deep) learning, where adversarial anal- 
ysis and training of ML models is performed using the semantics and context 
of the overall system within which the ML models are utilized. We identified 
several ideas for integrating semantics into adversarial learning, including using 
a semantic modification space, system-level formal specifications, training using 
semantic counterexamples, and utilizing more detailed information about the 
outputs produced by the ML model, including confidence levels, in the mod- 
ules that use these outputs to make decisions. Preliminary experiments show 
the promise of these ideas, but also indicate that much remains to be done. 
We believe the field of semantic adversarial learning will be a rich domain for 
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research at the intersection of machine learning, formal methods, and related 
areas. 
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Abstract. We demonstrate how deep learning over programs is used to 
provide (preliminary) augmented programmer intelligence. In the first 
part, we show how to tackle tasks like code completion, code summariza- 
tion, and captioning. We describe a general path-based representation of 
source code that can be used across programming languages and learn- 
ing tasks, and discuss how this representation enables different learning 
algorithms. In the second part, we describe techniques for extracting 
interpretable representations from deep models, shedding light on what 
has actually been learned in various tasks. 


1 Introduction 


We describe a journey from programs to interpretable deep models, and back. 
First, we show how to apply neural networks to learn interesting facts about pro- 
grams, and build (interpretable) models for several programming-related tasks. 
Then, we show how to extract finite-state automata from a given recurrent neural 
network, providing some insight on what a network has actually learned. 


1.1 Motivating Tasks 


Semantic Labeling of Code Snippets. Consider the code snippet of Figure 1. 
This snippet only contains low-level assignments to arrays, but a human reading 
the code may (correctly) label it as performing the reverse operation. Our goal 
is to be able to predict such labels automatically. The right hand side of Fig. 1 
shows the labels predicted automatically using our approach. The most likely 
prediction (77.34%) is reverseArray. Alon et al. [3] provide additional examples. 

Intuitively, this problem is hard because it requires learning a correspondence 
between the entire content of a code snippet and a semantic label. That is, it 
requires aggregating possibly hundreds of expressions and statements from the 
snippet into a single, descriptive label. 
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String[] f(final String[] array) { 


final String[] newArray = new String[array.length]; Predictions 

for (int index = 0; index < array.length; index++) { | reverseArray GEL D. 77.34% 
newArray[array.length - index - 1] = array[index]; reverse a) 18.18% 
} subArray Ca 1.45% 
return newArray; copyArray Goma) 0.74% 


) 


Fig. 1. A code snippet and its predicted labels as computed by our model. 


iTextSharp.text.pdf.PdfReader reader = new iTextSharp.text.pdf.PdfReader( 
new iTextSharp.text.pdf.RandomAccessFileOrArray(@"C:\PDFFile.pdf"), null); 


Prediction: | get the text of a pdf file in C# 


Fig. 2. A code snippet and its predicted caption as computed by our model. 


Captioning Code Snippets. Consider the short code snippet of Fig.2. The 
goal of code captioning is to assign a natural language caption that captures the 
task performed by the snippet. For the example of Fig.2 our approach auto- 
matically predicts the caption “get the text of a pdf file in C#”. Intuitively, this 
task is harder than semantic labeling, as it requires the generation of a natural 
language sentence in addition to capturing (something about) the meaning of 
the code snippet. 


OkHttpClient ok = new OkHttpClient(); 
Request request = new Request.Builder().url("programming.ai").build(); 
Response response =_ 


Prediction: | ok .newCall (request) .execute () 


Fig. 3. A code snippet and its predicted completion as computed by our model. 


Code Completion. Consider the code of Fig.3. Our code completion auto- 
matically predicts the next steps in the code: ok.newCall (request) .execute(). 
This task requires prediction of the missing part of the code based on a given 
context. Technically, this can be expressed as predicting a completion of a partial 
abstract syntax tree. 

In the next section, we show how techniques based on neural networks address 
all of these tasks, as well as other programming-related tasks. 
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2 From Programs to Deep Models 


2.1 Representation 


Leveraging machine learning models for predicting program properties such as 
variable names, method names, and expression types is a topic of much recent 
interest [1,2,6,8,9]. These techniques are based on learning a statistical model 
from a large amount of code and using the model to make predictions in new 
programs. A major challenge in these techniques is how to represent instances 
of the input space to facilitate learning [10]. Designing a program representation 
that enables effective learning is a critical task that is often done manually for 
each task and programming language. 


Our Approach. We present a program representation for learning from pro- 
grams. Our approach uses different path-based abstractions of the program’s 
abstract syntax tree. This family of path-based representations is natural, gen- 
eral, fully automatic, and works well across different tasks and programming 
languages. 


í UnaryPrefix! If ss. 
while (!d) { i | T 
if (someCondition()) { i SymbolRef on eae Assign= 
d-= true; SymbolRef symbolRef True 
| t | 
} someCondition true 
(a) A simple JavaScript program. (b) The program’s AST, and a path. 


Fig. 4. A JavaScript program and its AST, along with an example of one of the paths. 


AST Paths. We define AST paths as paths between nodes in a program’s 
abstract syntax tree (AST). To automatically generate paths, we first parse the 
program to produce an AST, and then extract paths between nodes in the tree. 
We represent a path in the AST as a sequence of nodes connected by up and 
down movements, and represent a program element as the set of paths that 
its occurrences participate in. Figure 4a shows an example JavaScript program. 
Figure 4b shows its AST, and one of the extracted paths. The path from the first 
occurrence of the variable d to its second occurrence can be represented as: 


SymbolRef 1 UnaryPrefix! 1 While | If | Assign= | SymbolRef 


This is an example of a pairwise path between leaves in the AST, but in 
general the family of path-based representations contains n-wise paths, which 
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do not necessarily span between leaves and do not necessarily contain all the 
nodes in between. We consider several choices of subsets of this family in [4]. 
Using a path-based representation has several major advantages: 


1. Paths are generated automatically: there is no need for manual design of fea- 
tures aiming to capture potentially interesting relationships between program 
elements. This approach extracts unexpectedly useful paths, without the need 
for an expert to design features. The user is required only to choose a subset 
of our proposed family of path-based representations. 

2. This representation is useful for any programming language, without the need 
to identify common patterns and nuances in each language. 

3. The same representation is useful for a variety of prediction tasks, by using 
it with off-the-shelf learning algorithms or by simply replacing the represen- 
tation of program elements in existing models (as we show in [4]). 

4. AST paths are purely syntactic, and do not require any semantic analysis. 


2.2  Code2vec: Learning Code Embeddings 


In [3], we present a framework for predicting program properties using neural 
networks. The main idea is a neural network that learns code embeddings - con- 
tinuous distributed vector representations for code. The code embeddings allow 
us to model correspondence between code snippet and labels in a natural and 
effective manner. By learning code embeddings, our long term goal is to enable 
the application of neural techniques to a wide-range of programming-languages 
tasks. A live demo of the framework is available at https:/ /code2vec.org. 

Our neural network architecture uses a representation of code snippets that 
leverages the structured nature of source code, and learns to aggregate multiple 
syntactic paths into a single vector. This ability is fundamental for the applica- 
tion of deep learning in programming languages. By analogy, word embeddings 
in natural language processing (NLP) started a revolution of application of deep 
learning for NLP tasks. 

The input to our model is a code snippet and a corresponding tag, label, 
caption, or name. This tag expresses the semantic property that we wish the 
network to model, for example: a tag, name that should be assigned to the snip- 
pet, or the name of the method, class, or project that the snippet was taken 
from. Let C be the code snippet and £ be the corresponding label or tag. Our 
underlying hypothesis is that the distribution of labels can be inferred from syn- 
tactic paths in C. Our model therefore attempts to learn the tag distribution, 
conditioned on the code: P (LIC). 


Model. For the full details of the model, see [3]. At a high-level, the key point 
is that a code snippet is composed of a bag of contexts, and each context is 
represented by a vector that its values are learned. The values of this vector 
capture two distinct goals: (1) the semantic meaning of this context, and (ii) the 
amount of attention this context should get. 

The problem is as follows: given an arbitrarily large number of context vec- 
tors, we need to aggregate them into a single vector. Two trivial approaches 
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would be to learn the most important one of them, or to use them all by vector- 
averaging them. These alternatives are shown to yield poor results (see [3]). 
Our main observation is that all context vectors need to be used, but the 
model should learn how much focus to give each vector. This is done by learning 
how to average context vectors in a weighted manner. The weighted average is 
obtained by weighting each vector by its dot product with another global atten- 
tion vector. The vector of each context and the attention vector are trained and 
learned simultaneously, using the standard neural approach of backpropagation. 


Interpreting Attention. Despite the “black-box” reputation of neural net- 
works, our model is partially interpretable thanks to the attention mechanism, 
which allows us to visualize the distribution of weights over the bag of path- 
contexts. Figures5 and 6 illustrates a few predictions, along with the path- 
contexts that were given the most attention in each method. The width of each 
of the visualized paths is proportional to the attention weight that it was allo- 
cated. We note that in these figures the path is represented only as a connecting 
line between tokens, while in fact it contains rich syntactic information which is 
not expressed properly in the figures. 


final String[] array) ( 
final String[] newArray = new String[array.length]; 
for (int index = 0; index < array.length; index-*-*) ( 


Predictions 
reverseArray SEE DO. 77.34% 
[index]; | reverse = 18.18% 


subArray Coco) 1.45% 


Fig. 5. Predictions and attention paths for the program of Fig. 1. The width of a path 
is proportional to its attention. 


boolean f(int n) { 


n_%(i)== (0) { 


i ody i++) { 
return false; 


swapped = false; 
} for (int j = i; j < array.length - (3) j++) { 


return true; 


} if (array[i] > array[j + 1]) { 
temp = array[i]; 

(a) 
array[i] = array[j + 1]; 

(b) array[j + 1] = temp; 

swapped = true; 

Predictions Predictions } 

isPrime aes sort D 99.8076 

isNonSingular £—-—————23 bubbleSort CC--————2 0.13% 

factorial EL shorten — CCC————23 0.02% 


Fig. 6. Example predictions from our model. The width of a path is proportional to 
its attention. 
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The examples of Figs. 5 and 6 are interesting since the top names are accurate 
and descriptive (reverseArray and reverse; isPrime; sort and bubbleSort) but do 
not appear explicitly in the code snippets. The code snippets, and specifically 
the most attended path-contexts describe lower-level operations. Suggesting a 
descriptive name for each of these methods is difficult and might take time even 
for a trained human programmer. 


2.3 Code2seq: Generating Sequences from Structured 
Representations of Code 


In contrast to classical (and widespread) seq2seq models for translation, we intro- 
duce a new model that performs encoding over source code, and decoding to 
natural language. 

Following [3,4], we introduce an approach for encoding source code that 
leverages the unique syntactic structure of programming languages. We represent 
a given code snippet as a set of paths over its abstract syntax tree (AST), where 
each path is compressed to a fixed-length vector. During decoding, code2seq 
attends over a different weighted sum of the path-vectors to produce each output 
token, much like NMT models attend over contextualized token representations 
in the source sentence. A live demo of the framework is available at https:// 
code2seq.org. 


3 From Deep Models to Automata 


In this section, we focus on extraction of finite-state automata from recurrent 
neural networks (RNNs). In recent years, there has been significant interest in 
the use of recurrent neural networks (RNNs), for learning languages. Like other 
supervised machine learning techniques, RNNs are trained based on a large set 
of examples of the target concept. While neural networks can reasonably approx- 
imate a variety of languages, and even precisely represent a regular language [5], 
they are in practice unlikely to generalize exactly to the concept being trained, 
and what they eventually learn in actuality is unclear [T]. Our goal in this work is 
to provide some insight into what a given trained network has actually learned, 
without requiring changes to the network architecture, or access to the original 
training data. 


Recurrent Neural Networks. Recurrent neural networks (RNNs) are a class 
of neural networks which are used to process sequences of arbitrary lengths. 
When operating over sequences of discrete alphabets, the input sequence is fed 
into the RNN on a symbol-by-symbol basis. For each input symbol the RNN 
outputs a state vector representing the sequence up to that point. A state vector 
and an input symbol are combined for producing the next state vector. The 
RNN is essentially a parameterized mathematical function that takes as input 
a state vector and an input vector, and produces a new state vector. The state 
vectors can be passed to a classification component that is used to produce a 
binary or multi-class classification decision. The RNN is trainable, and, when 
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trained together with the classification component, the training procedure drives 
the state vectors to provide a representation of the prefix which is informative 
for the classification task being trained. We call a combination of an RNN and 
a classification component an RNN-acceptor. 

A trained RNN-acceptor can be seen as a state machine in which the states 
are high-dimensional vectors: it has an initial state, a well defined transition 
function between internal states, and a well defined classification for each internal 
state. 


Problem Definition. Given an RNN-acceptor R trained to accept or reject 
sequences over an alphabet X, our goal is to extract a deterministic finite-state 
automaton (DFA) A that mimics the behavior of R. That is, our goal is to 
extract a DFA A such that the language L C X* of sequences accepted by A 
is observably equivalent to that accepted by R. Intuitively, we would like to 
obtain a DFA that accepts exactly the same language as the network, but this 
is generally practically impossible as we do not know in advance any bound on 
the maximum sample length necessary in order to observe all of its behavior. 


Extraction Using Queries and Counterexamples. In [11], we present a 
framework for extracting a finite state automaton from a given RNN. The main 
idea is to use the L* learning algorithm to learn an automaton while using the 
RNN as the teacher. 


QNS —ORNSTI 


(a) (b) 


Fig. 7. Two DFAs resembling, but not perfectly, the correct DFA for the regular lan- 
guage of tokenised JSON lists, (\[\])|(\[[SONTF](, [SSONTF]) « \])$. DFA (a) is almost 
correct, but accepts also list-like sequences in which the last item is missing, i.e. there 
is a comma followed by a closing bracket. DFA (b) is returned by L* after the teacher 
(network) rejects (a), but is also not a correct representation of the target language— 
treating the sequence [, as a legitimate list item equivalent to the characters S, 0, N, T, F. 
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3.1 What Has a Network Learned? 


Tokenized JSON Lists. We trained a GRU network with 2 layers and hidden 
size 100 on the regular language representing a simple tokenized JSON list with 
no nesting, 


AADIQISONTF](, [SONTF]) * \])$ 


over the 8-letter alphabet {[,], S, 0, N, T, F, ,}, to accuracy 100% on a training 
set of size 20000 and a test set of size 2000, both evenly split between positive 
and negative examples. As before, we extracted from this network using our 
method. 

Within 2 counterexamples (1 provided and 1 generated), our method 
extracted the automaton shown in Fig. 7a, which is almost but not quite repre- 
sentative of the target language. A few seconds later it returned a counterexam- 
ple to this DFA which pushed L* to refine further and return the DFA shown 
in Fig. 7b, which is also almost but not quite representative of zero-nesting tok- 
enized JSON lists. 

Ultimately after 400s, our method extracted (but did not reach equivalence 
on) an automaton of size 441, returning the counterexamples listed in Table 1 
and achieving 100% accuracy against the network on both its train set and all 


Table 1. Counterexamples returned to the equivalence queries made by L* during 
extraction of a DFA from a network trained to 100% accuracy on both train and 
test sets on the regular language (\[\])|(\[[SONTF](, [SONTF]) « \])$ over the 8-letter 
alphabet {[, 1, S, 0, N, T, F, ,}. Counterexamples highlighting the discrepancies between 
the network behaviour and the target behaviour are shown in bold. 


Counterexample generation for the non-nested tokenized JSON-lists language 


Counterexample | Generation time Network Target classification 
(seconds) classification 
] provided True True 
SS] 3.49 False False 
IL, ] 7.12 True False 
[S., 8.61 True False 
[0, F 8.38 True False 
N, 0, 8.07 False False 
[S, N, 0, 9.43 True False 
T, S, 9.56 False False 
S, S, T, [] 15.15 False False 
F, T, [ 3.23 False False 
IN, F, S, 0 10.04 True False 
[S, N, [,.., 27.79 True False 
[T, 0, T, 28.06 True False 
[S, T, 0,], 26.63 True False 
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sampled sequence lengths. As before, we note that each state split by the method 
is justified by concrete inputs to the network, and so the extraction of a large 
DFA is a sign of the inherent complexity of the learned network behavior. 


3.2 Counterexamples 


For many RNN-acceptors that train to 100% accuracy and exhibit perfect test set 
behavior on large test sets, our method was able to find many simple examples 
which the network misclassifies. 

For instance, for a network trained to classify simple email addresses over the 
38-letter alphabet {a,b, ...,z,0,1, ...,9,0,. ) as defined by the regular expression 


[a-z]|a-z0-9] *& [a-z0-9]4-. (com|net |co. [a-z] [a-z])$ 


with 10096 accuracy on a 40,000 sample train set and 10096 accuracy on a 2,000 
sample test set (i.e., a seemingly perfect network), the refinement-based L* 
extraction quickly returned several counterexamples, showing words that the 
network classifies incorrectly (e.g., the network accepted the non-email sequence 
25.net). While we could not extract a representative DFA from the network in 
the allotted time frame, our method did show that the network learned a far 
more elaborate (and incorrect) function than needed. 

Beyond demonstrating the counterexample generation capabilities of our 
extraction method, these results also highlight the brittleness in generalization of 
trained RNN networks, and suggests that evidence based on test-set performance 
should be taken with extreme caution. 


4 Conclusion 


We provide a brief description of a journey from programs to (somewhat) inter- 
pretable deep models that work well across different tasks and different program- 
ming languages. As we gained experience with these models, the question of what 
have they actually learned became more important (and subtle). Attention over 
AST paths provides some insight on what drives the predictions performed by 
(some of) the models, but a different approach is required for RNN-based models. 
This motivated the second part of our journey, trying to extract an interpretable 
model from a given RNN acceptor. This also motivated future work on classifying 
what can and cannot be learned by different kinds of RNNs [12]. 
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Abstract. We report on the development and use of formal verifica- 
tion tools within Amazon Web Services (AWS) to increase the secu- 
rity assurance of its cloud infrastructure and to help customers secure 
themselves. We also discuss some remaining challenges that could inspire 
future research in the community. 


1 Introduction 


Amazon Web Services (AWS) is a provider of cloud services, meaning on-demand 
access to IT resources via the Internet. AWS adoption is widespread, with over a 
million active customers in 190 countries, and $5.1 billion in revenue during the 
last quarter of 2017. Adoption is also rapidly growing, with revenue regularly 
increasing between 40-45% year-over-year. 

The challenge for AWS in the coming years will be to accelerate the devel- 
opment of its functionality while simultaneously increasing the level of security 
offered to customers. In 2011, AWS released over 80 significant services and fea- 
tures. In 2012, the number was nearly 160; in 2013, 280; in 2014, 516; in 2015, 
122; in 2016, 1,017. Last year the number was 1,430. At the same time, AWS 
is increasingly being used for a broad range of security-critical computational 
workloads. 

Formal automated reasoning is one of the investments that AWS is making 
in order to facilitate continued simultaneous growth in both functionality and 
security. The goal of this paper is to convey information to the formal verification 
research community about this industrial application of the community's results. 
Toward that goal we describe work within AWS that uses formal verification to 
raise the level of security assurance of its products. We also discuss the use of 
formal reasoning tools by externally-facing products that help customers secure 
themselves. We close with a discussion about areas where we see that future 
research could contribute further impact. 


Related Work. In this work we discuss efforts to make formal verification appli- 
cable to use-cases related to cloud security at AWS. For information on previous 
work within AWS to show functional correctness of some key distributed algo- 
rithms, see [43]. Other providers of cloud services also use formal verification to 
establish security properties, e.g. [23,34]. 
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Our overall strategy on the application of formal verification has been heav- 
ily influenced by the success of previous applied formal verification teams in 
industrial settings that worked as closely with domain experts as possible, e.g. 
work at Intel [33,50], NASA [31,42], Rockwell Collins [25], the Static Driver 
Verifier project [20], Facebook [45], and the success of Prover AB in the domain 
of railway switching [11]. 

External tools that we use include Boogie [1], Coq [4], CBMC [2], CVC4 
[5], Dafny [6], HOL-light [8], Infer [9], OpenJML [10], SAW [13], SMACK [14], 
Souffle [37], TLA+ [15], VCC [16], and Z3 [17]. We have also collaborated with 
many organizations and individuals, e.g. Galois, Trail of Bits, the University of 
Sydney, and the University of Waterloo. Finally, many PhD student interns have 
applied their prototype tools to our problems during their internships. 


2 Security of the Cloud 


Amazon and AWS aim to innovate quickly while simultaneously improving on 
security. An original tenet from the founding of the AWS security team is to 
never be the organization that says “no”, but instead to be the organization 
that answers difficult security challenges with “here’s how”. Toward this goal, the 
AWS security team works closely with product service teams to quickly identify 
and mitigate potential security concerns as early as possible while simultaneously 
not slowing the development teams down with bureaucracy. The security team 
also works with service teams early to facilitate the certification of compliance 
with industry standards. 

The AWS security team performs formal security reviews of all fea- 
tures/services, e.g. 1,430 services/features in 2017, a 41% year-over-year increase 
from 2016. Mitigations to security risks that are developed during these security 
reviews are documented as a part of the security review process. Another impor- 
tant activity within AWS is ensuring that the cloud infrastructure stays secure 
after launch, especially as the system is modified incrementally by developers. 


Where Formal Reasoning Fits In. The application security review process 
used within AWS increasingly involves the use of deductive theorem proving 
and/or symbolic model checking to establish important temporal properties 
of the software. For example, in 2017 alone the security team used deductive 
theorem provers or model checking tools to reason about cryptographic pro- 
tocols/systems (e.g. [24]), hypervisors, boot-loaders/BIOS/firmware (e.g. [27]), 
garbage collectors, and network designs. Overall, formal verification engagements 
within the AWS security team increased 76% year-over-year in 2017, and found 
45% more pre-launch security findings year-over-year in 2017. 

To support our needs we have modified a number of open-source projects and 
contributed those changes back. For example, changes to CBMC [2] facilitate its 
application to C-based systems at the bottom of the compute stack used in 
AWS data centers [27]. Changes to SAW [13] add support for the Java program- 
ming language. Contributions to SMACK [14] implement automata-theoretic 
constructions that facilitate automatic proofs that s2n [12] correctly implements 
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the code balancing mitigation for side-channel timing attacks. Source-code con- 
tributions to OpenJML [10] add support for Java 8 features needed to prove the 
correctness of code implementing a secure streaming protocol used throughout 
AWS. 

In many cases we use formal verification tools continuously to ensure that 
security is implemented as designed, e.g. [24]. In this scenario, whenever changes 
and updates to the service/feature are developed, the verification tool is re- 
executed automatically prior to the deployment of the new version. 

The security operations team also uses automated formal reasoning tools in 
its effort to identify security vulnerabilities found in internal systems and deter- 
mine their potential impact on demand. For example, an SMT-based semantic- 
level policy reasoning tool is used to find misconfigured resource policies. 

In general we have found that the internal use of formal reasoning tools 
provides good value for the investment made. Formal reasoning provides higher 
levels of assurance than testing for the properties established, as it provides 
clear information about what has and has not been secured. Furthermore, formal 
verification of systems can begin long before code is written, as we can prove the 
correctness of the high-level algorithms and protocols, and use under-constrained 
symbolic models for unwritten code or hardware that has not been fabricated 
yet. 


3 Securing Customers in the Cloud 


AWS offers a set of cloud-based services designed to help customers be secure in 
the cloud. Some examples include AWS Config, which provides customers with 
information about the configurations of their AWS resources; Amazon Inspec- 
tor, which provides automated security assessments of customer-authored AWS- 
based applications; Amazon GuardDuty, which monitors AWS accounts looking 
for unusual account usage on behalf of customers; Amazon Macie, which helps 
customers discover and classify sensitive data at risk of being leaked; and AWS 
Trusted Advisor, which automatically makes optimization and security recom- 
mendations to customers. 

In addition to automatic cloud-based security services, AWS provides peo- 
ple to help customers: Solutions Architects from different disciplines work with 
customers to ensure that they are making the best use of available AWS ser- 
vices; Technical Account Managers are assigned to customers and work with 
them when security or operational events arise; the Professional Services team 
can be hired by customers to work on bespoke cloud-based solutions. 


Where Formal Reasoning Fits In. Automated formal reasoning tools today 
provide functionality to customers through the AWS services Config, Inspector, 
GuardDuty, Macie, Trusted Advisor, and the storage service S3. As an exam- 
ple, customers using the $3 web-based console receiving alerts—via SMT-based 
reasoning—when their S3 bucket policies are possibly misconfigured. AWS Macie 
uses the same engine to find possible data exfiltration routes. Another appli- 
cation is the use of high-performance datalog constraint solvers (e.g. [37]) to 
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reason about questions of reachability in complex virtual networks built using 
AWS EC2 networking primitives. The theorem proving service behind this func- 
tionality regularly receives 10s of millions of calls daily. 

In addition to the automated services that use formal techniques, some mem- 
bers of the AWS Solutions Architects, Technical Account Managers and Profes- 
sional Services teams are applying and/or deploying formal verification directly 
with customers. In particular, in certain security-sensitive sectors (e.g. finan- 
cial services), the Professional Services organization are working directly with 
customers to deploy formal reasoning into their AWS environments. 

The customer reaction to features based on formal reasoning tools has been 
overwhelmingly positive, both anecdotally as well as quantitatively. Calls by 
AWS services to the automated reasoning tools increased by four orders of mag- 
nitude in 2017. With the formal verification tools providing the semantic foun- 
dation, customers can make stronger universal statements about their policies 
and networks and be confident that their assumptions are not violated. 


4 Challenges 


At AWS we have successfully applied existing or bespoke formal verification tools 
to both raise the level of security assurance of the cloud as well as help customers 
protect themselves in the cloud. We now know that formal verification provides 
value to applications in cloud security. There are, however, many problems yet 
to be solved and many applications of formal verification techniques yet to be 
discovered and/or applied. In the future we are hoping to solve the problems 
we face in partnership with the formal verification research community. In this 
section we outline some of those challenges. Note that in many cases existing 
teams in the research community will already be working on topics related to 
these problems, too many to cite comprehensively. Our comments are intended 
to encourage and inspire more work in this space. 


Reasoning About Risk and Feasibility. A security engineer spends the 
majority of their time informally reasoning about risk. The same is true for any 
corporate Chief Information Security Officer (CISO). We (the formal verifica- 
tion community) potentially have a lot to contribute in this space by developing 
systems that help reason more formally about the consequences of combinations 
of events and their relationships to bugs found in systems. Furthermore, our 
community has a lot to offer by bridging between our concept of a counterex- 
ample and the security community’s notion of a proof of concept (PoC), which 
is a constructive realization of a security finding in order to demonstrate its fea- 
sibility. Often security engineers will develop partial PoCs, meaning that they 
combine reasoning about risk and the finding of constructive witnesses in order 
to increase their confidence in the importance of a finding. There are valuable 
results yet to be discovered by our community at the intersection of reasoning 
about and synthesis of threat models, environment models, risk/probabilities, 
counterexamples, and PoCs. A few examples of current work on this topic include 
[18, 28, 30, 44, 48]. 
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Fixes Not Findings. Industrial users of formal verification technology need to 
make systems more secure, not merely find security vulnerabilities. This is true 
both for securing the cloud, as well as helping customers be secure in the cloud. 
If there are security findings, the primary objective is to find them and fix them 
quickly. In practice a lot of work is ahead for an organization once a security 
finding has been identified. As a community, anything we can do to reduce the 
friction for users trying to triage and fix vulnerabilities, the better. Tools that 
report false findings are quickly ignored by developers, thus as a community 
we should focus on improving the fidelity of our tools. Counterexamples can be 
downplayed by optimistic developers: any assistance in helping users understand 
the bugs found and/or their consequences is helpful. Security vulnerabilities that 
require fixes that are hard to build or hard to deploy are an especially important 
challenge: our community has a lot to offer here via the development of more 
powerful synthesis/repair methods (e.g. [22,32,39]) that take into account threat 
models, environment models, probabilities, counterexamples. 


Auditable Proof Artifacts for Compliance. Proof is actually two activi- 
ties: searching for a candidate proof, and checking the candidate proof’s validity. 
The searching is the art form, often involving a combination of heuristics that 
attempt to work around the undecidable. The checking of a proof is (in princi- 
ple) the boring yet rigorous part, usually decidable, often linear in the size of 
the proof. Proof artifacts that can be re-checked have value, especially in appli- 
cations related to compliance certification, e.g. DO-333 [26], CENENLEC EN 
50128 SIL 4 [11], EAL7 MILS [51]. Non-trivial parts of the various compliance 
and conformance standards can be checked via mechanical proof, e.g. parts of 
PCI and FIPS 140. Found proofs of compliance controls that can be shared 
and checked/re-checked have the possibility to reduce the cost of compliance 
certification, as well as reduce the time-to-market for organizations who require 
certification before using systems. 


Tracking Casual or Unrealistic Assumptions. Practical formal verifica- 
tion efforts often make unrealistic assumptions that are later forgotten. As an 
example, most tools assume that the systems we are analyzing are immune to 
single-event upsets, e.g. ionizing particles striking the microprocessor or semicon- 
ductor memory. We sometimes assume compilers and runtime garbage collectors 
are correct. In some cases (e.g. [20]) the environment models used by formal 
verification tools do not capture all possible real-world scenarios. As formal ver- 
ification tools become more powerful and useful we will increasingly need to 
reason about what has been proved and what has not been proved, in order to 
avoid misunderstandings that could lead to security vulnerabilities. In applica- 
tions of security this reasoning about assumptions made will need to interact 
with the treatment of risk and how risk is modified by various mitigations, e.g. 
some mitigations for single-event upsets make the events so unlikely they they 
are not a viable security risk, but still not impossible. This topic has been the 
focus of some attention over the years, e.g. CLINC stack [41], CompCert [3], 
and DeepSpec [7]. We believe that this will become an increasingly important 
problem in the future. 
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Distributed Formal Verification in the Cloud. Formal verification tools do 
not take enough advantage of modern data centers via distributing coordinated 
processes. Some examples of work in the right direction include [21, 35,36, 38, 40, 
47]. Especially in the area of program verification and analysis, our community 
still focuses on procedures that work on single computers, or perhaps portfolio 
solvers that try different problem encodings or solvers in parallel. Today large 
formal verification problems are often decomposed manually, and then solved 
in parallel. There has not been much research in methods for automatically 
introducing and managing the reasoning about the decompositions automatically 
in cloud-based distributed systems. This is in part perhaps due to the rules at 
various annual competitions such as SV-COMP, SMT-COMP, and CASC. We 
encourage the participants and organizers of competitions to move to cloud- 
based competitions where solvers have the freedom to use cloud-scale distributed 
computing to solve formal verification problems. Tool developers could build 
AMIs or CloudFormation templates that allow cloud distribution. Perhaps future 
contestants might even make Internet endpoints available with APIs supporting 
SMTLIB or TPTP such that the competition is simply a series of remote API 
calls to each competitor’s implementation. In this case competitors that embrace 
the full power of the cloud will have an advantage, and we will see dramatic 
improvements in the computational power of our formal verification tools. 


Continuous Formal Verification. As discussed previously, we have found 
that it is important to focus on continuous verification: it is not enough to 
simply prove the correctness of a protocol or system once, what we need is to 
continuously prove the desired property during the lifetime of the system [24]. 
This matches reports from elsewhere in industry where formal verification is 
being applied, e.g. [45]. An interesting consequence of our focus on continuous 
formal verification is that the time and effort spent finding an initial proof before 
a system is deployed is not as expensive as the time spent maintaining the 
proof later, as the up-front human cost of the pre-launch proof is amortized over 
the lifetime of the system. It would be especially interesting to see approaches 
developed that synthesize new proofs of modified code based on existing proofs 
of unmodified code. 


The Known Problems are Still Problems. Many of the problems that we 
face in AWS are well known to the formal verification community. For exam- 
ple, we need better tools for formal reasoning about languages such as Ruby, 
Python, and Javascript, e.g. [29,49]. Proofs about security-oriented properties 
of many large open source systems remain an open problem, e.g. Angular, Linux, 
OpenJDK, React, NGINX, Xen. Many formal verification tools are hard to use. 
Many tools are brittle prototypes only developed for the purposes of publica- 
tion. Better understanding of ISAs and memory models (e.g. [19,46]) are also 
key to prove the correctness of code operating on low-level devices. Practical and 
scalable methods for proving the correctness of distributed and/or concurrent 
systems remains an open problem. Improvements to the performance and scal- 
ability of formal verification tools are needed to prove the correctness of larger 
modules without manual decomposition. Abstraction refinement continues to be 
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a problem, as false bugs are expensive to triage in an industrial setting. Buggy 
(and thus unsound) proof-based tools lose trust in formal verification with the 
users who are trying to deploy them. 


5 Conclusion 


In this paper we have discussed how formal verification contributes to the ability 
of AWS to quickly develop and deploy new features while simultaneously increas- 
ing the security of the AWS cloud infrastructure. We also discussed how formal 
verification techniques contribute to customer-facing AWS services. In this paper 
we have outlined some challenges we face. We actively seek solutions to these 
problems and are happy to collaborate with partners in this pursuit. We look 
forward to more partnerships, more tools, more collaboration, and more sharing 
of information as we try to bring affordable, efficient and secure computation to 
all. 
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Abstract. The recent growth of the blockchain technology market puts 
its main cryptocurrencies in the spotlight. Among them, Ethereum 
stands out due to its virtual machine (EVM) supporting smart con- 
tracts, i.e., distributed programs that control the flow of the digital cur- 
rency Ether. Being written in a Turing complete language, Ethereum 
smart contracts allow for expressing a broad spectrum of financial appli- 
cations. The price for this expressiveness, however, is a significant seman- 
tic complexity, which increases the risk of programming errors. Recent 
attacks exploiting bugs in smart contract implementations call for the 
design of formal verification techniques for smart contracts. This, how- 
ever, requires rigorous semantic foundations, a formal characterization of 
the expected security properties, and dedicated abstraction techniques 
tailored to the specific EVM semantics. This work will overview the 
state-of-the-art in smart contract verification, covering formal seman- 
tics, security definitions, and verification tools. We will then focus on 
EtherTrust [1], a framework for the static analysis of Ethereum smart 
contracts which includes the first complete small-step semantics of EVM 
bytecode, the first formal characterization of a large class of security 
properties for smart contracts, and the first static analysis for EVM 
bytecode that comes with a proof of soundness. 


1 Introduction 


Blockchain technologies promise secure distributed computations even in absence 
of trusted third parties. The core of this technology is a distributed ledger that 
keeps track of previous transactions and the state of each account, and whose 
functionality and security is ensured by a careful combination of incentives and 
cryptography. Within this framework, software developers can implement sophis- 
ticated distributed, transaction-based computations by leveraging the scripting 
language offered by the underlying cryptocurrency. While many of these cryp- 
tocurrencies have an intentionally limited scripting language (e.g., Bitcoin [2]), 
Ethereum was designed from the ground up with a quasi Turing-complete lan- 
guage!. Ethereum programs, called smart contracts, have thus found a variety of 


1 While the language itself is Turing complete, computations are associated with a 
bounded computational budget (called gas), which gets consumed by each instruction 
thereby enforcing termination. 
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appealing use cases, such as auctions [3], data management systems [4], financial 
contracts [5], elections [6], trading platforms [7,8], permission management [9] 
and verifiable cloud computing [10], just to mention a few. Given their finan- 
cial nature, bugs and vulnerabilities in smart contracts may lead to catastrophic 
consequences. For instance, the infamous DAO vulnerability [11] recently led to 
a 60M$ financial loss and similar vulnerabilities occur on a regular basis [12,13]. 
Furthermore, many smart contracts in the wild are intentionally fraudulent, as 
highlighted in a recent survey [14]. 

A rigorous security analysis of smart contracts is thus crucial for the trust of 
the society in blockchain technologies and their widespread deployment. Unfor- 
tunately, this task is quite challenging for various reasons. First, Ethereum smart 
contracts are developed in an ad-hoc language, called Solidity, which resembles 
JavaScript but features specific transaction-oriented mechanisms and a number 
of non-standard semantic behaviours, as further described in this paper. Second, 
smart contracts are uploaded on the blockchain in the form of Ethereum Vir- 
tual Machine (EVM) bytecode, a stack-based low-level code featuring dynamic 
code creation and invocation and, in general, very little static information, which 
makes it extremely difficult to analyze. 


Our Contributions. This work overviews the existing approaches taken 
towards formal verification of Ethereum smart contracts and discusses 
EtherTrust, the first sound static analysis tool for EVM bytecode. Specifically, 
our contributions are 


A survey on recent theories and tools for formal verification of Ethereum 
smart contracts including a systematization of existing work with an overview 
of the open problems and future challenges in the smart contract realm. 

— Anillustrative presentation of the small-step semantics presented by [15] with 
special focus on the semantics of the bytecode instructions that allow for the 
initiation of internal transactions. The subtleties in the semantics of these 
transactions have shown to form an integral part of the attack surface in the 
context of Ethereum smart contracts. 

— A review of an abstraction based on Horn clauses for soundly over- 
approximating the small-step executions of Ethereum bytecode [1]. 

— A demonstration of how relevant security properties can be over- 

approximated and automatically verified using the static analyzer 

EtherTrust [1] by the example of the single-entrancy property defined in [15]. 


Outline. The remainder of this paper is organized as follows. Section 2 briefly 
overviews the Ethereum architecture, Sect. 3 reviews the state of the art in formal 
verification of Ethereum smart contracts, Sect. 4 revisits the Ethereum small-step 
semantics introduced by [15], Sect.5 presents the single-entrancy property for 
smart contracts as defined by [15], Sect.6 discusses the key ideas of the first 
sound static analysis for Ethereum bytecode as implemented in EtherTrust [1], 
Sect.7 shows how reachability properties can automatically be checked using 
EtherTrust, and Sect. 8 concludes summarizing the key points of the paper. 
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2 Background on Ethereum 


In the following we will shortly overview the mechanics of the cryptocurrency 
Ethereum and its built-in scripting language EVM bytecode. 


2.1 Ethereum 


Ethereum is a cryptographic currency system built on top of a blockchain. Simi- 
lar to Bitcoin, network participants publish transactions to the network that are 
then grouped into blocks by distinct nodes (the so called miners) and appended 
to the blockchain using a proof of work (PoW) consensus mechanism. The state 
of the system — that we will also refer to as global state — consists of the state 
of the different accounts populating it. An account can either be an external 
account (belonging to a user of the system) that carries information on its cur- 
rent balance or it can be a contract account that additionally obtains persistent 
storage and the contract’s code. The account’s balances are given in the subunit 
wei of the virtual currency Ether.” 

Transactions can alter the state of the system by either creating new contract 
accounts or by calling an existing account. Calls to external accounts can only 
transfer Ether to this account, but calls to contract accounts additionally execute 
the code associated to the contract. The contract execution might alter the 
storage of the account or might again perform transactions — in this case we talk 
about internal transactions. 

The execution model underlying the execution of contract code is described 
by a virtual state machine, the Ethereum Virtual Machine (EVM). This is quasi 
Turing complete as the otherwise Turing complete execution is restricted by the 
upfront defined resource gas that effectively limits the number of execution steps. 
The originator of the transaction can specify the maximal gas that should be 
spent for the contract execution and also determines the gas price (the amount 
of wei to pay for a unit of gas). Upfront, the originator pays for the gas limit 
according to the gas price and in case of successful contract execution that did 
not spend the whole amount of gas dedicated to it, the originator gets reimbursed 
with gas that is left. The remaining wei paid for the used gas are given as a fee 
to a beneficiary address specified by the miner. 


2.2 EVM Bytecode 


Contracts are delivered and executed in EVM bytecode format — an Assembler 
like bytecode language. As the core of the EVM is a stack-based machine, the 
set of instructions in EVM bytecode consists mainly of standard instructions 
for stack operations, arithmetics, jumps and local memory access. The classical 
set of instructions is enriched with an opcode for the SHA3 hash and several 
opcodes for accessing the environment that the contract was called in. In addi- 
tion, there are opcodes for accessing and modifying the storage of the account 


? One Ether is equivalent to 10!? wei. 
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currently running the code and distinct opcodes for performing internal call and 
create transactions. Another instruction particular to the blockchain setting is 
the SELFDESTRUCT code that deletes the currently executed contract - but 
only after the successful execution of the external transaction. 

The execution of each instruction consumes a positive amount of gas. The 
sender of the transaction specifies a gas limit and exceeding it results in an 
exception that reverts the effects of the current transaction on the global state. 
In the case of nested transactions, the occurrence of an exception only reverts 
its own effects, but not those of the calling transaction. Instead, the failure of 
an internal transaction is only indicated by writing zero to the caller’s stack. 


3 Overview on Formal Verification Approaches 


In the following we give an overview on the approaches taken so far in the direc- 
tion of securing (Ethereum) smart contracts. We distinguish between verification 
approaches and design approaches. According to our terminology, the goal of 
verification approaches is to check smart contracts written in existing languages 
(such as Solidity) for their compliance with a security policy or specification. In 
contrast, design approaches aim at facilitating the creation of secure smart con- 
tracts by providing frameworks for their development: These approaches encom- 
pass new languages which are more amenable to verification, provide a clear and 
simple semantics that is understandable by smart contract developers or allow 
for a direct encoding of desired security policies. In addition, we count works that 
aim at providing design patterns for secure smart contracts to this category. 


3.1 Verification 


In the field of smart contract verification we categorize the existing approaches 
along the following dimensions: target language (bytecode vs high level lan- 
guage), point of verification (static vs. dynamic analysis methods), provided 
guarantees (bug-finding vs. formal soundness guarantees), checked properties 
(generic contract properties vs. contract specific properties), degree of automa- 
tion (automated verification vs. assisted analysis vs. manual inspection). From 
the current spectrum of analysis tools, we can find solutions in the following 
clusters: 


Static Analysis Tools for Automated Bug-Finding. Oyente [16] is a state- 
of-the-art static analysis tool for EVM bytecode that relies on symbolic execu- 
tion. Oyente supports a variety of pre-defined security properties, such as trans- 
action order dependency, time-stamp dependency, and reentrancy that can be 
checked automatically. However, Oyente is not striving for soundness nor com- 
pleteness. T'his is on the one hand due to the simplified semantics that serves as 
foundation of the analysis [15]. On the other hand, the security properties are 
rather syntactic or pattern based and are lacking a semantic characterization. 
Recently, Zhou et al. proposed the static analysis tool SASC [17] that extends 
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Oyente by additional patterns and provides a visualization of detected risks in 
the topology diagram of the original Solidity code. 

Majan [18] extends the approach taken in Oyente to trace properties that 
consider multiple invocations of one smart contract. As Oyente, it relies on sym- 
bolic execution that follows a simplified version of the semantics used in Oyente 
and uses a pattern-based approach for defining the concrete properties to be 
checked. The tool covers safety properties (such as prodigality and suicidality) 
and liveness properties (greediness). As for Oyente, the authors do not make any 
security claims, but consider their tool a ‘bug catching approach’. 


Static Analysis Tools for Automated Verification of Generic Proper- 
ties. In contrast to the aforementioned class of tools, this line of research aims 
at providing formal guarantees for the analysis results. 

A recently published work is the static analysis tool ZEUS [19] that analyzes 
smart contracts written in Solidity using symbolic model checking. The analysis 
proceeds by translating Solidity code to an abstract intermediate language that 
again is translated to LLVM bitcode. Finally, existing symbolic model checking 
tools for LLVM bitcode are leveraged for checking generic security properties. 
ZEUS consequently only allows for analyzing contracts whose Solidity source 
code is made available. In addition, the semantics of the intermediate language 
cannot easily be reconciled with the actual Solidity semantics that is determined 
by its translation to EVM bytecode. This is as the semantics of the intermediate 
language by design does not allow for the revocation of the global system state 
in the case of a failed call — which however is fundamental feature of Ethereum 
smart contract execution. 

Other tools proposed in the realm of automated static analysis for generic 
properties are Securify [20], Mythril [21] and Manticore [22] (for analysing byte- 
code) and SmartCheck [23] and Solgraph [24] (for analyzing Solidity code). These 
tools however are not accompanied by any academic paper so that the concrete 
analysis goals stay unspecified. 


Frameworks for Semi-automated Proofs for Contract Specific Prop- 
erties. Hirai [25] formalizes the EVM semantics in the proof assistant 
Isabelle/HOL and uses it for manually proving safety properties for concrete 
contracts. This semantics, however, constitutes a sound over-approximation of 
the original semantics [26]. Building on top of this work, Amani et al. pro- 
pose a sound program logic for EVM bytecode based on separation logics [27]. 
This logic allows for semi-automatically reasoning about correctness properties 
of EVM bytecode using the proof assistant Isabelle/HOL. 

Hildebrandt et al. [28] define the EVM semantics in the K framework [29] 
— a language independent verification framework based on reachability logics. 
The authors leverage the power of the K framework in order to automatically 
derive analysis tools for the specified semantics, presenting as an example a gas 
analysis tool, a semantic debugger, and a program verifier based on reachability 
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logics. The derived program verifier still requires the user to manually specify 
loop invariants on the bytecode level. 

Bhargavan et al. [30] introduce a framework to analyze Ethereum contracts 
by translation into F*, a functional programming language aimed at program 
verification and equipped with an interactive proof assistant. The translation 
supports only a fragment of the EVM bytecode and does not come with a jus- 
tifying semantic argument. 


Dynamic Monitoring for Predefined Security Properties. Grossman et 
al. [31] propose the notion of effectively callback free executions and identify the 
absence of this property in smart contract executions as the source of common 
bugs such as reentrancy. They propose an efficient online algorithm for discov- 
ering executions violating effectively callback freeness. Implementing a corre- 
sponding monitor in the EVM would guarantee the absence of the potentially 
dangerous smart contract executions, but is not compatible with the current 
Ethereum version and would require a hard fork. 

A dynamic monitoring solution compatible with Ethereum is offered by the 
tool DappGuard [32]. The tool actively monitors the incoming transactions to a 
smart contract and leverages the tool Oyente [16], an own analysis engine and 
a simulation of the transaction on the testnet for judging whether the incom- 
ing transaction might cause a (generic) security violation (such as transaction 
order dependency). If a transaction is considered harmful, a counter transaction 
(killing the contract or performing some other fixes) is made. The authors claim 
that this transaction will be mined with high probability before the problematic 
one. Due to this uncertainty and the bug-finding tools used for evaluation of 
incoming transactions, this approach does not provide any guarantees. 


3.2 Design 


The current research on secure smart contract design focuses on the following 
four areas: high-level programming languages, intermediate languages (for veri- 
fication), security patterns for existing languages and visual tools for designing 
smart contracts. 


High-Level Languages. One line of research on high-level smart contract lan- 
guages concentrates on the facilitation of secure smart contract design by limiting 
the language expressiveness and enforcing strong static typing discipline. Sim- 
plicity [33] is a typed functional programming language for smart contracts that 
disallows loops and recursion. It is a general purpose language for smart contracts 
and not tailored to the Ethereum setting. Simplicity comes with a denotational 
semantics specified in Coq that allows for reasoning formally about Simplicity 
contracts. As there is no (verified) compiler to EVM bytecode so far, such results 
don’t carry over to Ethereum smart contracts. In the same realm, Pettersson and 
Edstróm [34], propose a library for the programming language Idris that allows 
for the development of secure smart contracts using dependent and polymorphic 
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types. They extend the existing Idris compiler with a generator for Serpent code 
(a Python-like high-level language for Ethereum smart contracts). This compiler 
is a proof of concept and fails in compiling more advanced contracts (as it can- 
not handle recursion). In a preliminary work, Coblenz [35] propose Obsidian, an 
object-oriented programming language that pursues the goal of preventing com- 
mon bugs in smart contracts such as reentrancy. To this end, Obsidian makes 
states explicit and uses a linear type system for quantities of money. 

Another line of research focuses on designing languages that allow for encod- 
ing security policies that are dynamically enforced at runtime. A first step in 
this direction is sketched in the preliminary work on Flint [36], a type-safe, 
capabilities-secure, contract-oriented programming language for smart contracts 
that gets compiled to EVM bytecode. Flint allows for defining caller capabilities 
restricting the access to security sensitive functions. These capabilities shall be 
enforced by the EVM bytecode created during compilation. But so far, there is 
only an extended abstract available. 

In addition to these approaches from academia, the Ethereum foundation 
currently develops the high-level languages Viper [37] and Bamboo [38]. Fur- 
thermore, the Solidity compiler used to support a limited export functionality 
to the intermediate language WhyML [39] allowing for a pre/post condition style 
reasoning on Solidity code by leveraging the deductive program verification plat- 
form Why3 [40]. 


Intermediate Languages. The intermediate language Scilla [41] comes with a 
semantics formalized in the proof assistant Coq and therefore allows for a mech- 
anized verification of Scilla contracts. In addition, Scilla makes some interesting 
design choices that might inspire the development of future high level languages 
for smart contracts: Scilla provides a strict separation not only between compu- 
tation and communication, but also between pure and effectful computations. 


Security Patterns. Wohrer [42] describes programming patterns in Solidity 
that should be adapted by smart contract programmers for avoiding common 
bugs. These patterns encompass best coding practices such as performing calls 
at the end of a function, but also off-the-self solutions for common security 
bugs such as locking a contract for avoiding reentrancy or the integration of a 
mechanism that allows the contract owner to disable sensitive functionalities in 
the case of a bug. 


Tools. Mavridou and Laszka [43] introduce a framework for designing smart 
contracts in terms of finite state machines. They provide a tool with a graphical 
editor for defining contract specifications as automata and give a translation 
of the constructed finite state machines to Solidity. In addition, they present 
some security extensions and patterns that can be used as off-the-shelf solutions 
for preventing reentrancy and implementing common security challenges such 
as time constraints and authorization. The approach however is lacking formal 
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foundations as neither the correctness of the translation is proven correct, nor 
are the security patterns shown to meet the desired security goals. 


3.8 Open Challenges 


Even though the previous section highlights the wide range of steps taken 
towards the analysis of Ethereum smart contracts, there are still a lot of open 
challenges left. 


Secure Compilation of High-Level Languages. Even though there are 
several proposals made for new high-level languages that facilitate the design of 
secure smart contracts and that are more amenable to verification, none of them 
comes so far with a verified compiler to EVM bytecode. Such a secure compilation 
however is the requirement for the results shown on high-level language programs 
to carry over to the actual smart contracts published on the blockchain. 


Specification Languages for Smart Contracts. So far, all approaches to 
verifying contract specific properties focus on either ad-hoc specifications in the 
used verification framework [25, 27, 28, 30] or the insertion of assertions into exist- 
ing contract code [39]. For leveraging the power of existing model checking tech- 
niques for program verification, the design of a general-purpose contract speci- 
fication language would be needed. 


Study of Security Policies. There has been no fundamental research made so 
far on the classes of security policies that might be interesting to enforce in the 
setting of smart contracts. In particular, it would be compelling to characterize 
the class of security policies that can be enforced by smart contracts within the 
existing EVM. 


Compositional Reasoning About Smart Contracts. Most research on 
smart contract verification focuses on reasoning about individual contracts or at 
most a bunch of contracts whose bytecode is fully available. Even though there 
has been work observing the similarities between smart contracts and concurrent 
programs [44], there has been no rigorous study on compositional reasoning for 
smart contracts so far. 


4 Semantics 


Recently, Grishchenko et al. [15] introduced the first complete small-step seman- 
tics for EVM bytecode. As this semantics serves as a basis for the static analyzer 
EtherTrust, we will in the following shortly review the general layout and the 
most important features of the semantics. 
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4.1 Execution Configurations 


Before discussing the small-step rules of the semantics, we first introduce the 
general shape of execution configurations. 


Global State. The global state of the Ethereum blockchain is represented as 
a (partial) mapping from account addresses to accounts. In the case that an 
account does not exist, we assume it to map to L. Accounts are composed of a 
nonce n that is incremented with every other account that the account creates, a 
balance b, a persistent unbounded storage stor and the account’s code. External 
accounts carry an empty code which makes their storage inaccessible and hence 
irrelevant. 


Small-Step Relation. The semantics is formalized by a small-step relation 
I FS — S' that specifies how a call stack S representing the state of the 
execution evolves within one step under the transaction environment I". We call 
the pair (I, S) a configuration. 


Transaction Environments. The transaction environment represents the 
static information of the block that the transaction is executed in and the 
immutable parameters given to the transaction as the gas prize or the gas limit. 
'These parameters can be accessed by distinct bytecode instructions and conse- 
quently influence the transaction execution. 


Call Stacks. A call stack S is a stack of execution states which represents the 
state of the overall execution of the initial external transaction. The individual 
execution states of the stack represent the states of the uncompleted internal 
transactions performed during the execution. Formally, a call stack is a stack 
of regular execution states of the form (,z,0) that can optionally be topped 
with a halting state HALT(c, gas, d) or an exception state EXC. Semantically, 
halting states indicate regular halting of an internal transaction, exception states 
indicate exceptional halting, and regular execution states describe the state of 
internal transactions in progress. Halting and exception states can only occur as 
top elements of the call stack as they represent terminated internal transactions. 
Halting states carry the information affecting the callee state such as the global 
state c that the internal execution halted in, the unspent gas gas from the 
internal transaction execution and the return data d. 

The state of a non-terminated internal transaction is described by a regular 
execution state of the form (j,1,0). The state is determined by the current 
global state c of the system as well as the execution environment + that specifies 
the parameters of the current transaction (including inputs and the code to be 
executed) and the local state u of the stack machine. 
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Table 1. Semantic rules for ADD 


L.code [u.pc| = ADD 
wS=a:nb:s pwgas>3 y =pI[s > (a+b): s|pc += 1][gas -= 3] 
DlDE(uuo):S-—(m,uo):sS 


ADD-FAIL 
L.code [u.pc]| = ADD (|u.s| < 2 V u.gas < 3) 


LE(uwuo):S —EXC:S 


Execution Environment. The execution environment . of an internal trans- 
action is a tuple of static parameters (actor, input, sender, value, code) to the 
transaction that, i.a., determine the code to be executed and the account in 
whose context the code will be executed. The execution environment incorpo- 
rates the following components: the active account actor that is the account that 
is currently executing and whose account will be affected when instructions for 
storage modification or money transfer are performed; the input data input given 
to the transaction; the address sender of the account that initiated the trans- 
action; the amount of wei value transferred with the transaction; the code code 
that is executed by the transaction. The execution environment is determined 
upon initialization of an internal transaction execution, and it can be accessed, 
but not altered during the execution. 


Machine State. The local machine state u represents the state of the under- 
lying stack machine used for execution. Formally it is represented by a tuple 
(gas, pc, m, aw, s) holding the amount of gas gas available for execution, the pro- 
gram counter pc, the local memory m, the number of active words in memory 
aw, and the machine stack s. 

The execution of each internal transaction starts in a fresh machine state, 
with an empty stack, memory initialized to all zeros, and program counter and 
active words in memory set to zero. Only the gas is instantiated with the gas 
value available for the execution. We call execution states with machine states 
of this form initial. 


4.2 Small-Step Rules 


In the following, we will present a selection of interesting small-step rules in 
order to illustrate the most important features of the semantics. 


Local Instructions. For demonstrating the overall design of the semantics, we 
start with the example of the arithmetic expression ADD performing addition 
of two values on the machine stack. The small-step rules for ADD are shown 
in Table 1. We use a dot notation, in order to access components of the different 
state parameters. We name the components with the variable names introduced 
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for these components in the last section written in sans-serif-style. In addition, 
we use the usual notation for updating components: t[c — v] denotes that the 
component c of tuple t is updated with value v. For expressing incremental 
updates in a simpler way, we additionally use the notation t[c += v] to denote 
that the (numerical) component of c is incremented by v and similarly t[c —— v] 
for decrementing a component c of t. 

'The execution of the arithmetic instruction ADD only performs local changes 
in the machine state affecting the local stack, the program counter, and the 
gas budget. For deciding upon the correct instruction to execute, the currently 
executed code (that is part of the execution environment) is accessed at the 
position of the current program counter. The cost of an ADD instruction consists 
always of three units of gas that get subtracted from the gas budget in the 
machine state. As every other instruction, ADD can fail due to lacking gas or 
due to underflows on the machine stack. In this case, the exception state is 
entered and the execution of the current internal transaction is terminated. For 
better readability, we use here the slightly sloppy V notation for combining the 
two error cases in one inference rule. 


Transaction Initiating Instructions. A class of instructions with a more 
involved semantics are those instructions initiating internal transactions. This 
class incorporates instructions for calling another contract (CALL, CALLCODE 
and DELEGATECALL) and for creating a new contract (CREATE). We will 
explain the semantics of those instructions in an intuitive way omitting tech- 
nical details. 

The call instructions initiate a new internal call transaction whose parameters 
are specified on the machine stack — including the recipient (callee) and the 
amount of money to be transferred (in the case of CALL and CALLCODE). In 
addition, the input to the call is specified by providing the corresponding local 
memory fragment and analogously a memory fragment for the return value. 

When executing a call instruction, the specified amount of wei is transferred 
to the callee and the code of the callee is executed. The different call types 
diverge in the environment that the callee code is executed in. In the case of 
a CALL instruction, while executing the callee code (only) the account of the 
callee can be accessed and modified. So intuitively, the control is completely 
handed to the callee as its code is executed in its own context. In contrast, in 
the case of CALLCODE, the executed callee code can (only) access and modify 
the account of the caller. So the callee's code is executed in the caller's context 
which might be useful for using library functionalities implemented in a separate 
library contract that e.g., transfer money on behalf of the caller. 

This idea is pushed even further in the DELEGATECALL instruction. This call 
type does not allow for transferring money and executes the callee's code not 
only in the caller's context, but even preserves part of the execution environment 
of the previous call (in particular the call value and the sender information). 
Intuitively, this instruction resembles adding the callee's code to the caller as 
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Fig. 1. Illustration of the semantics of different call types 


an internal function so that calling it does not cause a new internal transaction 
(even though it formally does). 

Figure 1 summarizes the behavior of the different call instructions in EVM 
bytecode. The executed code of the respective account is highlighted in orange 
while the accessible account state is depicted in green. The remaining inter- 
nal transaction information (as specified in the execution environment) on the 
sender of the internal transaction and the transferred value are marked in vio- 
let. In addition, the picture relates the corresponding changes to the small-step 
semantics: the execution of a call transaction adds a new execution state to 
the call stack while preserving the old one. The new global state g’ records 
the changes in the accounts’ balances, while the new execution environment /' 
determines the accessible account (by setting the actor of the internal transaction 
correspondingly), the code to be executed (by setting code) and further acces- 
sible transaction information as the sender, value and input (by setting sender, 
value and input respectively). 
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Fig. 2. Illustration of the semantics of the CREATE instruction (Color figure online) 


The CREATE instruction initiates an internal transaction that creates a new 
account. The semantics of this instruction is similar to the one of CALL, with 
the exception that a fresh account is created, which gets the specified value 
transferred, and that the input provided to this internal transaction, which is 
again specified in the local memory, is interpreted as the initialization code to 
be executed in order to produce the newly created account’s code as output. 
Figure 2 depicts the semantics of the CREATE instruction in a similar fashion 
as it is done for the call instructions before. It is notable that the input to the 
CREATE instruction is interpreted as code and executed (therefore highlighted 
in orange) in the context of the newly created contract (highlighted in green). 
During this execution the newly created contract does not have any contract 
code itself (therefore depicted in gray), but only after completing the internal 
transaction the return value of the transaction will be set as code for the freshly 
created contract. 


5 Security Properties 


Grishchenko et al. [15] propose generic security definitions for smart contracts 
that rule out certain classes of potentially harmful contract behavior. These 
properties constitute trace properties (more precisely, safety properties) as well 
as hyper properties (in particular, value independence properties). In this work, 
we revisit one of these safety properties called single-entrancy and use this prop- 
erty as a case study for showing how safety properties of smart contracts (that 
can be over-approximated by pure reachability properties) can be automatically 
checked by static analysis. For checking value independence properties, in [1] the 
reviewed analysis technique is extended with a simple dependency analysis that 
we will not discuss further in this work. 


5.1 Preliminary Notations 


Formally, contracts are represented as tuples of the form (a,code) where a 
denotes the address of the contract and code denotes the contract’s code. 
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In order to give concise security definitions, we further introduce, and assume 
all through the paper, an annotation to the small step semantics in order to 
highlight the contract c that is currently executed. In the case of initialization 
code being executed, we use |. We write S + --S' for the concatenation of call 
stacks S and S'. Finally, for arguing about EVM bytecode executions, we are only 
interested in those initial configurations that might result from a valid external 
transaction in a valid block. In the following, we will call these configurations 
reachable and refer to [15] for a detailed definition. 


5.2 Single-Entrancy 


For motivating the definition of single-entrancy, we introduce a class of bugs in 
Ethereum smart contracts called reentrancy bugs [14,16]. 

The most famous representative of this class is the so-called DAO bug that 
led to a loss of 60 million dollars in June 2016 [11]. In an attack exploiting this 
bug, the affected contract was drained out of money by subsequently reentering 
it and performing transactions to the attacker on behalf of the contract. 

The cause of such bugs mostly roots in the developer's misunderstanding 
of the semantics of Solidity's call primitives. In general, calling a contract can 
invoke two kinds of actions: Transferring Ether to the contract's account or 
Executing (parts of) a contracts code. In particular, Solidity's ca11 construct 
(being translated to a CALL instruction in EVM bytecode) invokes the execution 
of a fraction of the callee's code — specified in the so called fallback function. A 
contract's fallback function is written as a function without names or argument 
as depicted in the mallory contract in Fig. 3b. 

Consequently, when using the caii construct the developer may expect an 
atomic value transfer where potentially another contract's code is executed. For 
illustrating how to exploit this sort of bug, we consider the contracts in Fig. 3. 


contract Bob( 


1 
2 bool sent - false; 
3 function ping( address c) { 1 contract Mallory{ 
4 if (!sent) { c.call.value(2) (); 2 function () { 
5 sent = true; }}} 3 Bob (msg.sender) .ping (this); }} 
(a) Smart contract with reentrancy bug (b) Smart contract exploiting reentrancy bug 


Fig. 3. Reentrancy attack 


The function ping of contract sob sends an amount of 2 wei to the address 
specified in the argument. However, this should only be possible once, which is 
potentially ensured by the sent variable that is set after the successful money 
transfer. Instead, it turns out that invoking the cai1.value function on a contract’s 
address invokes the contract’s fallback function as well. 

Given a second contract mallory, it is possible to transfer more money than 
the intended 2 wei to the account of mallory. By invoking sob’s function ping with 
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the address of mallory’s account, 2 wei are transferred to waiiory's account and 
additionally the fallback function of mallory is invoked. As the fallback function 
again calls the ping function with waiiory's address another 2 wei are transferred 
before the variable sent of contract sob was set. This looping goes on until all 
gas of the initial call is consumed or the callstack limit is reached. In this case, 
only the last transfer of wei is reverted and the effects of all former calls stay 
in place. Consequently the intended restriction on contract Bob's ping function 
(namely to only transfer 2 wei once) is circumvented. 

Motivated by these kinds of attacks, the notion of single-entrancy was intro- 
duced. Intuitively, a contract is single-entrant if it cannot perform any more calls 
once it has been reentered. Formally this property can be expressed in terms of 
the small-steps semantics as follows: 


Definition 1 (Single-entrancy [15]). A contract c is single-entrant if for all 
reachable configurations (I, s. :: S), it holds for all s', s", S" that 


DEsuS-—*s.:85 Se: 
= ads” ES, d CCL. PES er SM +45en 8 3 sens eu S Seid 


This property constitutes a safety property. We will show in Sect. 7 how it 
can be appropriately abstracted for being expressed in the EtherTrust analysis 
framework. 


IE c* >* S! S 
= = = 
+ + + 
Tl, U h pod > Iy 


Fig. 4. Simplified soundness statement 


6 Verification 


Grishchenko et al. [1] developed a static analysis framework for analyzing reach- 
ability properties of EVM smart contracts. This framework relies on an abstract 
semantics for EVM bytecode soundly over-approximating the semantics pre- 
sented in Sect. 4. 

In the following we will review the abstractions performed on the small-step 
configurations and execution rules using the example of the abstract execution 
rule for the ADD instruction. Afterwards, we will discuss shortly how call instruc- 
tions are over-approximated. 
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6.1 Abstract Semantics 


Figure4 gives an overview on the relation between the small-step and the 
abstract semantics. For the analysis, we will consider a particular contract c* 
under analysis whose code is known. An over-approximation of the behavior of 
this smart contract will be encoded in Horn clauses(A). These describe how an 
abstract configuration (represented by a set of abstract state predicates) evolves 
within the execution of the contract’s instructions. Abstract configurations are 
obtained by translating small-step configurations to a set JI of facts over state 
predicates that characterize (an over-approximation of) the original configura- 
tion. This transformation is performed with respect to the contract c* as only 
all local behavior of this particular contract will be over-approximated and con- 
sequently only those elements on the callstack representing executions of c* are 
translated. Finally, we will show that no matter how the contract c* is called (so 
for every arbitrary reachable configuration I’, Se» :: S), every sequence of execu- 
tion steps that is performed while executing it can be mimicked by a derivation 
of the abstract configuration IT, (obtained from translating the execution state 
s) using the horn clauses A (that model the abstract semantics of the contract 
c*). More precisely, this means that from the set of facts I7, U A a set IT can 
be derived that is a coarser abstraction (<:) than JMs, which is the translation 
of the execution's intermediate call stack S”. A corresponding formal soundness 
statement is proven in [1]. 


6.2 Abstract Configurations 


Table 2 shows the analysis facts used for describing the abstract semantics. These 
consist of (instances of) state predicates that represent partial abstract config- 
urations. Accordingly, abstract configurations are sets of facts not containing 
any variables as arguments. We will refer to such facts as closed facts. Finally, 
abstract contracts are characterized as sets of Horn clauses over the state pred- 
icates (facts) that describe the state changes induced by the instructions at the 
different program positions. Here only those state predicates are depicted that 
are needed for describing the abstract semantics of the ADD instruction. 

'The state predicates are parametrized by a program point pp that is a tuple of 
the form (id*, pc) with id" being a contract identifier for contract c* and pc being 
the program counter at which the abstract state holds.? The parametrization by 
the contract identifier helps to make the analysis consider a set of contracts 
whose code is known (such as e.g., library code that is known to be used by the 
contract). In this work however we focus on the case where c* represented by 
identifier id' is the only known contract. In addition, the predicates carry the 
relative call depth cd as argument. The relative call depth is the size of the call 
stack built up on the execution of c* (Cf. call stack S" in Fig. 4) and serves as 
abstraction for the (relative) call stack that contract c* is currently executed on. 


3 Making the program counter a parameter instead of an argument is a design choice 
made in order to minimize the number of recursive horn clauses simplifying auto- 
mated verification. 
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Table 2. Analysis Facts. All arguments in the analysis facts marked with a hat (°) 
range over D U Vars where D is the abstract domain and Vars is the set of variables. 
All other arguments of analysis facts range over N with exception of sa that ranges over 
(N > D) U Vars. Closed facts cf are assumed to be facts with arguments not coming 
from Vars. 


Facts fc= 

Abs. machine state | MStatepp ((size, sa), dw, gas, cd) 
Abs. memory | Memyp, (pos, va, cd) 

Abs. exception state | Excia« (cd) 


|. 
Abs. configurations m = I, ensa fu) 


Horn clauses =Va".A\; fi = f 
Abs. contracts A = {M,...,Hn} 


The relative call depth helps to distinguish different recursive executions of c* 
and thereby improves the precision of the analysis. 

As the ADD instruction only operates on the local machine state, we focus 
on the abstract representation of the machine state u: The state predicates 
representing js are MState,, and Mem,,. The fact MState,, ((size, sa), dw, gas, cd) 
says that at program point pp and relative call depth cd the machine stack is 
of size size and its current configuration is described by the mapping sa which 
maps stack positions to abstract values, aw represents the number of active words 
in memory, and gas is the remaining gas. Similarly, the fact Mempp (pos, 6, cd) 
states that at program point pp and relative call depth cd at memory address 
pos there is the (abstract) value 6. The values on the stack and in local memory 
range over an abstract domain. Concretely, we define the abstract domain D to 
be the set (.L, T,a*} UN which constitutes a bounded lattice (D, C, u,n, T, 1) 
satisfying L C a* C T and LE n C T for all n € N. Intuitively, in our analysis 
T will represent unknown (symbolic) values and a* will represent the unknown 
(symbolic) address of contract c*. 

Treating the address of the contract under analysis in a symbolic fashion is 
crucial for obtaining a meaningful analysis, as the address of this account on the 
blockchain can not easily be assumed to be known upfront. Although discussing 
this peculiarity is beyond the scope of this paper, a broader presentation of the 
symbolic address paradigm can be found in the technical report [1]. 

For performing operations and comparisons on values from the abstract 
domain, we will assume versions of the unary, binary and comparison opera- 
tors on the values from D. We will mark abstract operators with a hat (-) and 
e.g., write + for abstract addition or = for abstract equality. The operators will 
treat T and a* as arbitrary values so that e.g., T Fn evaluates to T and T =n 
evaluates to true and false for all n € N. 

Formally, we establish the relation between a concrete machine state u and 
its abstraction by an abstraction function that translates machine states to a set 
of closed analysis facts. Figure 3 shows the abstraction function a, that maps a 
local machine state into an abstract state consisting of a set of analysis facts. The 
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abstraction is defined with respect to the relative call depth cd of the execution 
and a value abstraction function * that maps concrete values into values from 
the abstract domain. The function * thereby maps all concrete values to the 
corresponding (concrete) values in the abstract domain, but those values that 
can potentially represent the address of contract c*, hence, they are translated 
to a* and therefore over-approximated. This treatment might introduce spurious 
counterexamples with respect to the concrete execution of the real contract on 
the blockchain (where it is assigned a concrete address). On the one hand, this 
is due to the fact that by this abstraction the concrete value of the address 
is assumed to be arbitrary. On the other hand, abstract computations with a 
always result in T and therefore possible constraints on these results are lost. 
However, the first source of imprecision should not be considered an imprecision 
per se, as the c*’s address is not assumed to be known statically, thus, the goal of 
the abstraction is to over-approximate the executions with all possible addresses. 
The translation proceeds by creating a set of instances of the machine state 
predicates. For creating instances of the MState,, predicate, the concrete values 
aw and gas are over-approximated by aw and gas respectively, and the stack is 
translated to an abstract array representation using the function stackToArray. 
The instances of the memory predicate are created by translating the memory 
mapping m to a relational representation with abstract locations and values.4 


Table 3. Abstraction function for the local machine state pw 


a, ((gas, pc, m, aw, s), cd) :— {MStaterg-, pc) (stackToArray (s), aw, gas, cd) } 


U (Mem, po) (pas, ù, cd) | m [pos] = v ^ pos < 27°°} 


stackToArray (e) := (0, Az. 0) 
stackToArray (æ :: s) :— let (size, sa) = stackToArray (s) in (size + 1, saz^) 


6.3 Abstract Execution Rules 


As all state predicates are parametrized by their program points, the abstract 
semantics needs to be formulated with respect to program points as well. More 
precisely this means that for each program counter of contract c* a set of Horn 
clauses is created that describes the semantics of the instruction at this program 
counter. Formally, a function (12? is defined that creates the required set of 
rules given that the instruction inst is at position pc of contract c*’s code. 


^ The reason for using a separate predicate for representing local memory instead of 
encoding it as an argument of array type in the main machine state predicate is 
purely technical: for modeling memory usage correctly we would need a rich set of 
array operations that are however not supported by the fixedpoint engines of modern 
SMT solvers. 


Foundations and Tools for the Static Analysis of Ethereum Smart Contracts 69 


Table 4 shows a part of the definition (excerpt of the rules) of (pic? for the 
ADD instruction. The main functionality of the rule is described by the Horn 
clause 1 that describes how the machine stack and the gas evolve when execut- 
ing ADD. First the precondition is checked whether the sufficient amount of gas 
and stack elements are available. Then the two (abstract) top elements $ and ĝ 
are extracted from the stack and their sum is written to the top of the stack while 
reducing the overall stack size by 1. In addition, the local gas value is reduced by 
3 in an abstract fashion. In the memory rule (Horn clause 2), again the precon- 
ditions are checked and then (as memory is not affected by the ADD instruction) 
the memory is propagated. This propagation is needed due to the memory predi- 
cate's parametrization with the program counter: For making the memory accessi- 
ble in the next execution step, its values need to be written into the corresponding 
predicate for the next program counter. Finally, Horn clauses 3 and 4 characterize 
the exception cases: an exception while executing the ADD instruction can occur 
either because of a stack underflow or as the execution runs out of gas. In both 
cases the exception state is entered which is indicated by recording the relative 
call depth of the exception in the predicate Excij;* (cd). 

By allowing gas values to come from the abstract domain, we enable symbolic 
treatment of gas. In particular this means that when starting the analysis with 
gas value T, all gas calculations will directly result in T again (and could there- 
fore be omitted) and in particular all checks on the gas will result in true and 
false and consequently always both paths (regular execution via Horn clauses 1 
and 2 and exception via Horn clause 4) will be triggered in the analysis. 

For over-approximating the semantics of call instructions, more involved 
abstractions are needed. We will illustrate these abstractions in the following 
in an intuitive way and refer to [1] for the technical details. Note that in the 
following we will assume CALL instructions to be the only kind of transaction 
initiating instructions that are contained in the contracts that we consider for 
analysis. A generalization of the analysis that allows for incorporating also other 
call types is presented in [1]. 

As we are considering c* the only contract to be known, whenever a call is 
performed that is not a self-call, we need to assume that an arbitrary contract 
c! gets executed. The general idea for over-approximating calls to an unknown 
contract c’ is that only those execution states that represent executions of con- 
tract c* will be over-approximated. Consequently, when a call is performed, all 
possible effects on future executions of c* that might be caused by the execution 
of c^ (including the initiation of further initial transactions that might cause 
reentering c*) need to be captured. For doing this as accurate as possible, we 
use the following observations: 


1. Given that c* only executes plain CALL instructions the persistent storage of 
contract c* can only be altered during executions of c*. 

2. Contracts have a single entry point: their execution always starts in a fresh 
machine state at program counter zero. 


In general, we can soundly capture the possibility of contract c* being reen- 
tered during the execution of c^ by assuming to reenter c* at every higher call 
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Table 4. Excerpt of the abstract rules for ADD 


(ADD). = {MStateria, po) ((size, sa), dw, gas, cd) ^ size > 1 ^ gas £ 3 


^ & = sa|size — 1] ^ à = salsize — 2] 


=> MState (ia, pest) ((size — 1, 5a, 9)» dw, gas — 3, cd), (1) 
Memo, po) (pos, Va, cd) ^ MState(a, po) ((size, sa), gas, dw, cd) 

A size>1Agas>3> Memic, pc«1) (pòs, Va, cd), (2) 
MState (ia, po) ((size, sa), gas, dw, cd) ^ size < 2 => EXCia« (cd), (3) 


MState cia, pc) ((size, sa), gàs, dw, cd) ^ gas € 3 — EXCias (cd) ...) (4) 


cd 


e 


------------------y 


Fig. 5. Illustration of the abstraction of the semantics for the CALL instruction. 


level. For keeping the desired precision, we can use the previously made obser- 
vations for imposing restrictions on the reenterings of c*: First, we assume the 
persistent storage of c* to be the same as at the point of calling (observation 1.). 
Second, we know that execution starts at program counter 0 in a fresh machine 
state (observation 2.). This allows us to initialize the machine state predicates 
presented in Table 2 accordingly at program counter zero. All other parts of the 
global state and the execution environment need to be considered unknown at 
the point of reentering as they might have potentially been changed during the 
execution of c^. This in particular also applies to the balance of contract c*. 
Figure5 illustrates how the abstract configurations over-approximating the 
concrete execution states of c* evolve within the execution of the abstract seman- 
tics. We write II > S for denoting that an abstract configuration J (here graph- 
ically depicted in gray frames) is an over-approximation of call stack S. The 
depicted execution starts in the initial execution state Se» of c*. This is state is 
over-approximated by assuming the storage and balance of c* as well as all other 


Foundations and Tools for the Static Analysis of Ethereum Smart Contracts 71 


information on the global state to be unknown and therefore initialized to T in 
the corresponding state predicates of the abstract configuration (denoted in the 
picture by marking the corresponding state components in red). The execution 
steps representing the executions of local instructions are mimicked step-wise by 
corresponding abstract execution steps. During these steps a more refined knowl- 
edge about the state of c* and its environment might be gained (e.g., the value 
of some storage cells where information is written, or some restrictions on the 
account’s balances, marked in green or blue, respectively). When finally a CALL 
instruction is executed, every potential reentering of contract c* (here exempli- 
fied by execution state te=) is over-approximated by abstract configurations for 
every call depths cd > 0 that consider all global state and environmental infor- 
mation to be arbitrary, but the parts modeling the persistent storage of c* to be 
as at the point of calling. In Sect. 7 we will show how this abstraction will help us 
to automatically check smart contracts for single-entrancy in a sound and pre- 
cise manner. In addition to these over-approximations that capture the effects on 
c* during the execution of an unknown contract, for over-approximating CALL 
instructions some other abstractions need to be performed that model the seman- 
tics of returning: 


— For returning it is always assumed that potentially the call failed or returned 
with arbitrary return values. 

— After returning the global state is assumed to be altered arbitrarily by the 
call and therefore its components are set to T. 


For a complete account and formal description of the abstractions, we refer to the 
full specification of the abstract semantics spelled out in the technical report [1]. 


7 Verifying Security Properties 


In this section, we will show how the previously presented analysis can be used 
for proving reachability properties of Ethereum smart contracts in an automated 
fashion. 

To this end, we review EtherTrust [1], the first sound static analyzer for 
EVM bytecode. EtherTrust proceeds by translating contract code provided in 
the bytecode format into an internal Horn clause representation. This Horn 
clause representation, together with facts over-approximating all potential initial 
configurations are handed to the SMT solver Z3 [45] via an API. For showing 
that the analyzed contract satisfies a reachability property, the unsatisfiability 
of the corresponding analysis queries needs to be verified using Z3's fixedpoint 
engine SPACER [46]. If all analysis queries are deemed unsatisfiable then the 
contract under analysis is guaranteed to satisfy the original reachability query 
due to the soundness of the underlying analysis. 

In the following we will discuss the analysis queries used for verifying single- 
entrancy and illustrate how these queries allow for capturing contracts that are 
vulnerable to reentrancy such as the example presented in Sect. 5. 
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7.1 Over-Approximating Single-Entrancy 


For being able to automatically check for single-entrancy, we need to simplify 
the original property in order to obtain a description that is expressible in terms 
of the analysis framework described in Sect.6. To this end, a strictly stronger 
property named call unreachability is presented that is proven to imply single- 
entrancy: 


Definition 2 (Call unreachability [1]). A contract c is call unreachable if for 
all initial execution states (p, v, 0) such that (p, 1,0), is well formed, it holds that 
for all transaction environments I' and all call stacks S 


ads, S.I E (nu, ,0),:8 —* seu S' +48 
^ |S'| » 0 ^ code(c) [s.n.pc] € Instaui 


With Instea = (CALL, CALLCODE, DELEGATECALL, CREATE} 


Intuitively, this property states that it should not be possible to reach a call 
instruction of c* after reentering. As we are excluding all transaction initiating 
instructions but CALL from the analysis, it is sufficient to query for the reacha- 
bility of a CALL instruction of c* on a higher call depth. More precisely, we end 
up with the following set of queries: 


(MStatei; pc) ((size, sa), aw, gas, cd) ^ cd > 0 | code (c*) [pe] = CALL} (5) 


As the MStatep, predicate tracks the state of the machine state at all program 
points, it can be used as indicator for reachability of the program point as such. 
Consequently, by querying the MState(;j4« pc) for all program counters pc where 
c* has a CALL instruction and along with that requiring a call depth exceeding 
zero, we can check whether a call instruction is reachable in some reentering 
execution. 


7.2 Examples 


We will use examples for showing how the analysis detects, and proves the 
absence of reentrancy bugs, respectively. To this end, we revisit the contract 
Bob presented in Sect. 5, and introduce a contract aiice that fixes the reentrancy 
bug that is present in Bob. The two contracts are shown in Figure 6. 


Detecting Reentrancy Bugs. We illustrate how the analysis detects reen- 
trancy bugs using the example in Figure6a. To this end we give a graphical 
description of the over-approximations performed when analyzing contract Bob 
which is depicted in Figure 7. For the sake of presentation, we give the contract 
code in Solidity instead of bytecode and argue about it on this level even though 
the analysis is carried out on bytecode level. 

As discussed in Sect. 6.3, the analysis considers the execution of contract sob 
to start in an unknown environment, which implies that also the value of the 
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contract Bob{ contract Alice{ 


1 1 

2 bool sent - false; 2 bool sent - false; 

3 function ping( address c) { 3 function ping( address c)( 

4 if (!sent) ( c.call.value(2) 0; 4 if (!sent) ( sent - true; 

5 sent = true; }}} 5 c.call.value(2)(); ))) 
(a) Smart contract with reentrancy bug (b) Smart contract with fixed reentrancy bug 


Fig. 6. Examples for contracts showing and being robust against the reentrancy bug. 


contract’s sent variable is unknown and hence initialized to T. As a consequence, 
the equality check in line 4 is considered to evaluate to both true and false in 
the abstract setting (as T needs to be considered to potentially equal every 
concrete value). Accordingly, the analysis needs to consider the then-branch of 
the conditional and consequently the call in line 4. This call is over-approximated 
as discussed in Sect. 6.3, and therefore considers reentering contract Bob in an 
arbitrary call depth. In this situation, the sent variable is still over-approximated 
to have value T wherefore the call at line 4 can be reached again which satisfies 
the reachability query in Eq. 5. 


Proving Single-Entrancy. We consider the contract alice shown in Figure 6b. 
In contrast to contract Bob, this contract does not have the reentrancy vulnera- 
bility, as the guard sent that should prevent the call instruction in line 5 from 
being executed more than once is set before performing the call. As a conse- 
quence, when reentering the contract, the guard is already set and stops any 
further calls. We show that the analysis presented in Sect.6 is precise enough 
for proving this contract to be single-entrant. Intuitively, the abstraction is pre- 
cise as it considers that the contract’s persistent storage can be assumed to be 
unchanged at the point of reentering. Consequently, the then-branch of the 
conditional can be excluded from the analysis when reentering and the contract 
can be proven to be single-entrant. A graphic description of this argument is 
provided in Figure8. As for contract so», the analysis starts in an abstract con- 
figuration that assigns the sent variable value T, which forces the analysis to 
consider the then as well as the else-branch of the conditional in line 4. When 
taking the else-branch, the contract execution terminates without reaching a 
state satisfying the reachability query. Therefore, it is sufficient to only consider 
the then-branch for proving the impossibility of re-reaching the call instruc- 
tion. When executing the caii in the then-branch, according to the abstract 
call semantics, the analysis needs to take all abstract configurations represent- 
ing executions of alice at higher call depths into account. However, in each of 
these abstract configurations it can be assumed that the state of the persistent 
storage (including the sent variable, highlighted in green) is the same as at the 
point of calling. As at this point sent was already initialized to the concrete value 
true, the then-branch of the conditional can be excluded from the analysis at 
any call depth cd > 0 and consequently the unreachability of the query in Eq. 5 
is proven. 


74 I. Grishchenko et al. 


if sent==false 
?.call.value(2)(); 


sent = true; 


?.call.value(2)(); 


sent =true; sent =true; 


Reachability query 


cd>0 A ?.call.value(2)0; 


Fig. 7. Illustration of the attack detection in contract Bob by the static analysis. 


7.3 Discussion 


In this section, we illustrated how the static analysis underlying EtherTrust [1] 
in principle is capable not only of detecting re-entrancy bugs, but also of prov- 
ing smart contracts single-entrant. In practice, EtherTrust manages to analyze 
real-world contracts from the blockchain within several seconds, as detailed in 
the experimental evaluation presented in [1]. Even though EtherTrust produces 
false positives due to the performed over-approximations, it still shows better 
precision on a benchmark than the state-of-the art bug-finding tool Oyente [16] 
— despite being sound. Similar results are shown when using EtherTrust for 
checking a simple value independency property. 

In general, EtherTrust could be easily extended to support more properties 
on contract execution — given that those properties or over-approximations of 
them are expressible as reachability or simple value independency properties. By 
contrast, checking more involved hyper properties, or properties that span more 
than one execution of the external transaction execution is currently out of the 
scope for EtherTrust. 
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if sent==false 
?.call.value(2)(); 


sent =true; 


if sent==false if sent==false 


sent =true; sent =true; 


e ?.call.value(2)(); 7.call.value(2)0; 
Sa =o —_— 


Reachability query 


cd>0 A 


Fig. 8. Illustration of proving single-entrancy of contract alice by the static analysis. 


8 Conclusion 


We presented a systematization of the state-of-the-art in Ethereum smart con- 
tract verification and outlined the open challenges in this field. Also we discussed 
in detail the foundations of EtherTrust [1], the first sound static analyzer for 
EVM bytecode. In particular, we reviewed how the small-step semantics pre- 
sented in [15] is abstracted into a set of Horn clauses. Also we presented how 
single-entrancy — a relevant smart contract security property — is expressed in 
terms of queries, which can be then automatically solved leveraging the power 
of an SMT solver. 
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Abstract. We present layered concurrent programs, a compact and 
expressive notation for specifying refinement proofs of concurrent pro- 
grams. A layered concurrent program specifies a sequence of connected 
concurrent programs, from most concrete to most abstract, such that 
common parts of different programs are written exactly once. These pro- 
grams are expressed in the ordinary syntax of imperative concurrent 
programs using gated atomic actions, sequencing, choice, and (recursive) 
procedure calls. Each concurrent program is automatically extracted 
from the layered program. We reduce refinement to the safety of a 
sequence of concurrent checker programs, one each to justify the connec- 
tion between every two consecutive concurrent programs. These checker 
programs are also automatically extracted from the layered program. 
Layered concurrent programs have been implemented in the CIVL verifier 
which has been successfully used for the verification of several complex 
concurrent programs. 


1 Introduction 


Refinement is an approach to program correctness in which a program is 
expressed at multiple levels of abstraction. For example, we could have a sequence 
of programs 74,..., P4, P, 41 where P: is the most concrete and the Ph+1 is the 
most abstract. Program Pı can be compiled and executed efficiently, Ph+1 is 
obviously correct, and the correctness of P; is guaranteed by the correctness of 
i44 for all i € [1, h]. These three properties together ensure that P is both effi- 
cient and correct. To use the refinement approach, the programmer must come 
up with each version P; of the program and a proof that the correctness of Pj+1 
implies the correctness of ;. This proof typically establishes a connection from 
every behavior of P; to some behavior of 7j. 

Refinement is an attractive approach to the verified construction of complex 
programs for a number of reasons. First, instead of constructing a single mono- 
lithic proof of Pı, the programmer constructs a collection of localized proofs 
establishing the connection between P; and P, for each i € [1, h]. Each local- 
ized proof is considerably simpler than the overall proof because it only needs to 
reason about the (relatively small) difference between adjacent programs. Sec- 
ond, different localized proofs can be performed using different reasoning meth- 
ods, e.g., interactive deduction, automated testing, or even informal reasoning. 


© The Author(s) 2018 
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abstraction refinement 
Pı Pa Pi Pitt Ph Phi 
Cı C; Ch 


Fig. 1. Concurrent programs P; and connecting checker programs C; represented by a 
layered concurrent program LP. 


Finally, refinement naturally supports a bidirectional approach to correctness— 
bottom-up verification of a concrete program via successive abstraction or top- 
down derivation from an abstract program via successive concretization. 

'This paper explores the use of refinement to reason about concurrent pro- 
grams. Most refinement-oriented approaches model a concurrent program as a 
flat transition system, a representation that is useful for abstract programs but 
becomes increasingly cumbersome for a concrete implementation. To realize the 
goal of verified construction of efficient and implementable concurrent programs, 
we must be able to uniformly and compactly represent both highly-detailed and 
highly-abstract concurrent programs. This paper introduces layered concurrent 
programs as such a representation. 

A layered concurrent program £7 represents a sequence 74,..., Ph, Ph+1 of 
concurrent programs such that common parts of different programs are written 
exactly once. These programs are expressed not as flat transition systems but 
in the ordinary syntax of imperative concurrent programs using gated atomic 
actions [4], sequencing, choice, and (recursive) procedure calls. Our programming 
language is accompanied by a type system that allows each P; to be automat- 
ically extracted from LP. Finally, refinement between P; and j,, is encoded 
as the safety of a checker program C; which is also automatically extracted from 
LP. Thus, the verification of 7, is split into the verification of h concurrent 
checker programs C4, ...,C; such that C; connects P; and P;44 (Fig. 1). 

We highlight two crucial aspects of our approach. First, while the programs P; 
have an interleaved (i.e., preemptive) semantics, we verify the checker programs 
C; under a cooperative semantics in which preemptions occur only at procedure 
calls. Our type system [5] based on the theory of right and left movers [10] ensures 
that the cooperative behaviors of C; cover all preemptive behaviors of P;. Second, 
establishing the safety of checker programs is not tied to any particular verifi- 
cation technique. Any applicable technique can be used. In particular, different 
layers can be verified using different techniques, allowing for great flexibility in 
verification options. 


1.1 Related Work 


This paper formalizes, clarifies, and extends the most important aspect of 
the design of CivL [6], a deductive verifier for layered concurrent programs. 
Hawblitzel et al. [7] present a partial explanation of CIVL by formalizing the 
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connection between two concurrent programs as sound program transformations. 
In this paper, we provide the first formal account for layered concurrent pro- 
grams to represent all concurrent programs in a multi-layered refinement proof, 
thereby establishing a new foundation for the verified construction of concurrent 
programs. 

CIVL is the successor to the QED [4] verifier which combined a type system for 
mover types with logical reasoning based on verification conditions. QED enabled 
the specification of a layered proof but required each layer to be expressed in 
a separate file leading to code duplication. Layered programs reduce redundant 
work in a layered proof by enabling each piece of code to be written exactly once. 
QED also introduced the idea of abstracting an atomic action to enable attach- 
ing a stronger mover type to it. This idea is incorporated naturally in layered 
programs by allowing a concrete atomic action to be wrapped in a procedure 
whose specification is a more abstract atomic action with a more precise mover 
type. 

Event-B [1] is a modeling language that supports refinement of systems 
expressed as interleaved composition of events, each specified as a top-level 
transition relation. Verification of Event-B specifications is supported by the 
Rodin [2] toolset which has been used to model and verify several systems of 
industrial significance. TLA4- [9] also specifies systems as a flat transition sys- 
tem, enables refinement proofs, and is more general because it supports liveness 
specifications. Our approach to refinement is different from Event-B and TLA-- 
for several reasons. First, Event-B and TLA+ model different versions of the 
program as separate flat transition systems whereas our work models them as 
different layers of a single layered concurrent program, exploiting the standard 
structuring mechanisms of imperative programs. Second, Event-B and TLA+ 
connect the concrete program to the abstract program via an explicitly specified 
refinement mapping. Thus, the guarantee provided by the refinement proof is 
contingent upon trusting both the abstract program and the refinement map- 
ping. In our approach, once the abstract program is proved to be free of failures, 
the trusted part of the specification is confined to the gates of atomic actions in 
the concrete program. Furthermore, the programmer never explicitly specifies a 
refinement mapping and is only engaged in proving the correctness of checker 
programs. 

The methodology of refinement mappings has been used for compositional 
verification of hardware designs [11,12]. The focus in this work is to decompose 
a large refinement proof connecting two versions of a hardware design into a 
collection of smaller proofs. A variety of techniques including compositional rea- 
soning (converting a large problem to several small problems) and customized 
abstractions (for converting infinite-state to finite-state problems) are used to 
create small and finite-state verification problems for a model checker. This work 
is mostly orthogonal to our contribution of layered programs. Rather, it could be 
considered an approach to decompose the verification of each (potentially large) 
checker program encoded by a layered concurrent program. 
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2 Concurrent Programs 


In this section we introduce a concurrent programming language. The syntax of 
our programming language is summarized in Fig. 2. 


Val 2B 
v € Var — GVar U LVar gs € pm 
I,O,L C LVar as € At+ (I,O,e,t) 
o € Store = Var > Val ps € P > (I,O, L,s) 
e € Expr = Store — Val m € Proc U Action 
t€ Trans — 9Storex Store NA € 9Store 
Bo Pid P € Prog ::= (gs, as, ps, m, T) 


1,0 € IOMap = LVar — LVar 


s € Stmt ::= skip | s; s | if e then s else s | pcall (A,v,0) (P,5,0) (A,u,0) 
Fig. 2. Concurrent programs 


Preliminaries. Let Val be a set of values containing the Booleans. The set of 
variables Var is partitioned into global variables G Var and local variables LVar. 
A store c is a mapping from variables to values, an expression e is a mapping 
from stores to values, and a transition t is a binary relation between stores. 


Atomic Actions. A fundamental notion in our approach is that of an atomic 
action. An atomic action captures an indivisible operation on the program state 
together with its precondition, providing a universal representation for both low- 
level machine operations (e.g., reading a variable from memory) and high-level 
abstractions (e.g., atomic procedure summaries). Most importantly for reasoning 
purposes, our programming language confines all accesses to global variables to 
atomic actions. Formally, an atomic action is a tuple (I, O, e, t). The semantics 
of an atomic action in an execution is to first evaluate the expression e, called 
the gate, in the current state. If the gate evaluates to false the execution fails, 
otherwise the program state is updated according to the transition t. Input vari- 
ables in I can be read by e and t, and output variables in O can be written 
by t. 


Remark 1. Atomic actions subsume many standard statements. In particular, 
(nondeterministic) assignments, assertions, and assumptions. The following table 
shows some examples for programs over variables x and y. 


Command e b 


r:rcy true |z'— zr y^y =y 


havoc x true |y =y 


assert r«y |r«y T =TAY =Y 


assume z < y| true |x<yAa’=a2Ay'=y 
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Procedures. A procedure is a tuple (1, O, L, s) where I,O,L are the input, 
output, and local variables of the procedure, and s is a statement composed from 
skip, sequencing, if, and parallel call statements. Since only atomic actions can 
refer to global variables, the variables accessed in if conditions are restricted to 
the inputs, outputs, and locals of the enclosing procedure. The meaning of skip, 
sequencing, and if is as expected and we focus on parallel calls. 


Pcalls. A parallel call (pcall, for short) pcall (A,v,0) (P,t,0) (A, 1,0) consists 
of a sequence of invocations of atomic actions and procedures. We refer to the 
invocations as the arms of the pcall. In particular (A,v,0) is an atomic-action 
arm and (P,t,0) is a procedure arm. An atomic-action arm executes the called 
atomic action, and a procedure arm creates a child thread that executes the 
statement of the called procedure. The parent thread is blocked until all arms 
of the pcall finish. In the standard semantics the order of arms does not matter, 
but our verification technique will allow us to consider the atomic action arms 
before and after the procedure arms to execute in the specified order. Parameter 
passing is expressed using partial mappings 1, o between local variables; 1 maps 
formal inputs of the callee to actual inputs of the caller, and o maps actual 
outputs of the caller to formal outputs of the callee. Since we do not want 
to introduce races on local variables, the outputs of all arms must be disjoint 
and the output of one arm cannot be an input to another arm. Finally, notice 
that our general notion of a pcall subsumes sequential statements (single atomic- 
action arm), synchronous procedure calls (single procedure arm), and unbounded 
thread creation (recursive procedure arm). 


Concurrent Programs. A concurrent program P is a tuple (gs, as, ps, m, T), 
where gs is a finite set of global variables used by the program, as is a finite 
mapping from action names A to atomic actions, ps is a finite mapping from 
procedure names P to procedures, m is either a procedure or action name that 
denotes the entry point for program executions, and Z is a set of initial stores. 
For convenience we will liberally use action and procedure names to refer to the 
corresponding atomic actions and procedures. 


Semantics. Let P = (gs, as, ps, m, T) be a fixed concurrent program. A state 
consists of a global store assigning values to the global variables and a pool 
of threads, each consisting of a local store assigning values to local variables 
and a statement that remains to be executed. An execution is a sequence of 
states, where from each state to the next some thread is selected to execute one 
step. Every step that switches the executing thread is called a preemption (also 
called a context switch). We distinguish between two semantics that differ in 
(1) preemption points, and (2) the order of executing the arms of a pcall. 

In preemptive semantics, a preemption is allowed anywhere and the arms 
of a pcall are arbitrarily interleaved. In cooperative semantics, a preemption is 
allowed only at the call and return of a procedure, and the arms of a pcall are 
executed as follows. First, the leading atomic-action arms are executed from left 
to right without preemption, then all procedure arms are executed arbitrarily 
interleaved, and finally the trailing atomic-action arms are executed, again from 
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left to right without preemption. In other words, a preemption is only allowed 
when a procedure arm of a pcall creates a new thread and when a thread termi- 
nates. 

For we only consider executions that start with a single thread that execute 
m from a store in Z. P is called safe if there is no failing execution, i.e., an 
execution that executes an atomic action whose gate evaluates to false. We 
write Safe(P) if P is safe under preemptive semantics, and CSafe(P) if P is safe 
under cooperative semantics. 


2.1 Running Example 


In this section, we introduce a sequence of three concurrent programs (Fig. 3) 
to illustrate features of our concurrent programming language and the layered 
approach to program correctness. Consider the program Pe% in Fig. 3(a). The 
program uses a single global Boolean variable b which is accessed by the two 
atomic actions CAS and RESET. The compare-and-swap action CAS atomically 
reads the current value of b and either sets b from false to true and returns 
true, or leaves b true and returns false. The RESET action sets b to false and 
has a gate (represented as an assertion) that states that the action must only 
be called when b is true. Using these actions, the procedures Enter and Leave 
implement a spinlock as follows. Enter calls the CAS action and retries (through 
recursion on itself) until it succeeds to set b from false to true. Leave just 
calls the RESET action which sets b back to false and thus allows another thread 
executing Enter to stop spinning. Finally, the procedures Main and Worker serve 
as a simple client. Main uses a pcall inside a nondeterministic if statement to 
create an unbounded number of concurrent worker threads, which just acquire 
the lock by calling Enter and then release the lock again by calling Leave. The 
call to the empty procedure Alloc is an artifact of our extraction from a layered 
concurrent program and can be removed as an optimization. 

Proving P}°** safe amounts to showing that RESET is never called with b set 
to false, which expresses that I^^ follows a locking discipline of releasing only 
previously acquired locks. Doing this proof directly on PŁ has two drawbacks. 
First, the proof must relate the possible values of b with the program counters 
of all running threads. In general, this approach requires sound introduction of 
ghost code and results in complicated case distinctions in program invariants. 
Second, the proof is not reusable across different lock implementations. The 
correctness of the client does not specifically depend on using a spinlock over 
a Boolean variable, and thus the proof should not as well. We show how our 
refinement-based approach addresses both problems. 

Program PJ°** in Fig.3(b) is an abstraction of T^^ that introduces an 
abstract lock specification. The global variable b is replaced by lock which 
ranges over integer thread identifiers (0 is a dedicated value indicating that 
the lock is available). The procedures Alloc, Enter and Leave are replaced by 
the atomic actions ALLOC, ACQUIRE and RELEASE, respectively. ALLOC allocates 
unique and non-zero thread identifiers using a set of integers slot to store the 
identifiers not allocated so far. ACQUIRE blocks executions where the lock is not 
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— — aP (b) Py 
var b : bool var lock : int 
var linear slots : set<int> 


proc Main() proc Main() 
if (*) Af {*) 
pcall Worker(), Main() pcall Worker(), Main() 
proc Worker() proc Worker() 
var linear tid : int 
pcall Alloc() pcall tid :- ALLOC() 
pcall Enter() pcall ACQUIRE(tid) 
pcall Leave() pcall RELEASE(tid) 
proc Alloc() : () right ALLOC() : (linear tid : int) 
skip assume tid !- 0 && tid c slots 
slots := slots - tid 
proc Enter() right ACQUIRE(linear tid : int) 
var success : bool assert tid !- 0 
pcall success :- CAS() assume lock -- 
if (success) lock := tid 
skip 
else 


pcall Enter() 


proc Leave() left RELEASE(linear tid : int) 
pcall RESET() assert tid !- 0 && lock -- tid 
skip lock := 0 
atomic CAS() : (success : bool) E———— ——5 6) “pire 
if (b) success :- false 
else success, b :- true, true 
both SKIP() 
atomic RESET() skip 
assert b 
b :- false 


Fig. 3. Lock example 


available (assume lock -- 0) and sets lock to the identifier of the acquiring 
thread. RELEASE asserts that the releasing thread holds the lock and sets lock 
to 0. Thus, the connection between P/°* and P5°** is given by the invariant 
b <==> lock != 0 which justifies that Enter refines ACQUIRE and Leave refines 
RELEASE. The potential safety violation in P}°** by the gate of RESET is pre- 
served in Pl^** by the gate of RELEASE. In fact, the safety of PJ°** expresses the 
stronger locking discipline that the lock can only be released by the thread that 
acquired it. 

Reasoning in terms of ACQUIRE and RELEASE instead of Enter and Leave is 
more general, but it is also simpler! Figure3(b) declares atomic actions with a 
mover type |5], right for right mover, and left for left mover. A right mover 
executed by a thread commutes to the right of any action executed by a different 
thread. Similarly, a left mover executed by thread commutes to the left of any 
action executed by a different thread. A sequence of right movers followed by 
at most one non-mover followed by a sequence of left movers in a thread can 
be considered atomic [10]. The reason is that any interleaved execution can 
be rearranged (by commuting atomic actions), such that these actions execute 
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consecutively. For PJ°** this means that Worker is atomic and thus the gate of 
RELEASE can be discharged by pure sequential reasoning; ALLOC guarantees tid 
!- 0 and after executing ACQUIRE we have lock == tid. As a result, we finally 
obtain that the atomic action SKIP in 7Í^^ (Fig.3(c)) is a sound abstraction 
of procedure Main in 7J^**, Hence, we showed that program PeF is safe by 
soundly abstracting it to P4°**, a program that is trivially safe. 

The correctness of right and left annotations on ACQUIRE and RELEASE, 
respectively, depends on pair-wise commutativity checks among atomic actions 
in Pl^-*, These commutativity checks will fail unless we exploit the fact that 
every thread identifier allocated by Worker using the ALLOC action is unique. For 
instance, to show that ACQUIRE executed by a thread commutes to the right of 
RELEASE executed by a different thread, it must be known that the parameters 
tid to these actions are distinct from each other. The linear annotation on 
the local variables named tid and the global variable slots (which is a set of 
integers) is used to communicate this information. 

The overall invariant encoded by the linear annotation is that the set of 
values stored in slots and in local linear variables of active stack frames across 
all threads are pairwise disjoint. This invariant is guaranteed by a combination of 
a linear type system [14] and logical reasoning on the code of all atomic actions. 
The linear type system ensures using a flow analysis that a value stored in a linear 
variable in an active stack frame is not copied into another linear variable via 
an assignment. Each atomic action must ensure that its state update preserves 
the disjointness invariant for linear variables. For actions ACQUIRE and RELEASE, 
which do not modify any linear variables, this reasoning is trivial. However, 
action ALLOC modifies slots and updates the linear output parameter tid. Its 
correctness depends on the (semantic) fact that the value put into tid is removed 
from slots; this reasoning can be done using automated theorem provers. 


3 Layered Concurrent Programs 


A layered concurrent program represents a sequence of concurrent programs 
that are connected to each other. That is, the programs derived from a layered 
concurrent program share syntactic structure, but differ in the granularity of 
the atomic actions and the set of variables they are expressed over. In a layered 
concurrent program, we associate layer numbers and layer ranges with variables 
(both global and local), atomic actions, and procedures. These layer numbers 
control the introduction and hiding of program variables and the summarization 
of compound operations into atomic actions, and thus provide the scaffolding of a 
refinement relation. Concretely, this section shows how the concurrent programs 
plock, plock, and Piece’ (Fig. 3) and their connections can all be expressed in a 
single layered concurrent program. In Sect. 4, we discuss how to check refinement 
between the successive concurrent programs encoded in a layered concurrent 
program. 


Syntax. The syntax of layered concurrent programs is summarized in Fig. 4. Let 
N be the set of non-negative integers and I the set of nonempty intervals [a, 0]. 
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[a,b] ={x]a,bcEeNAa<a<b} GS € GVar — I 
AS € Ac (L,O,e,t,r) 


oss ] et IS € Ae (L,O,e,t,m) 
r E€I= {[a,b] |a <b} PS € P 5 (L,O, L, s,n, ns, A) 
ns € LVar — N m € Proc 

T € getore 


LP € LayeredProg ::= (GS, AS, IS, PS, m, 2) 


s € Stmt :— --- | icall (A,0,0) | pcall, (Pi, tis oi);en 4 (o € {£} U[L k]) 


Fig. 4. Layered concurrent programs 


We refer to integers as layer numbers and intervals as layer ranges. A layered 
concurrent program LP is a tuple (GS, AS, IS, PS, m,T) which, similarly to con- 
current programs, consists of global variables, atomic actions, and procedures, 
with the following differences. 


1. GS maps global variables to layer ranges. For GS(v) = [a, 6] we say that v is 
introduced at layer a and available up to layer b. 

2. AS assigns a layer range r to atomic actions denoting the layers at which an 
action exists. 

3. IS (with a disjoint domain from AS) distinguishes a special type of atomic 
actions called introduction actions. Introduction actions have a single layer 
number n and are responsible for assigning meaning to the variables intro- 
duced at layer n. Correspondingly, statements in layered concurrent programs 
are extended with an icall statement for calling introduction actions. 

4. PS assigns a layer number n, a layer number mapping for local variables ns, 

and an atomic action A to procedures. We call n the disappearing layer and A 
the refined atomic action. For every local variable v, ns(v) is the introduction 
layer of v. 
The pcall, statement in a layered concurrent program differs from the pcall 
statement in concurrent programs in two ways. First, it can only have proce- 
dure arms. Second, it has a parameter a which is either e (unannotated pcall) 
or the index of one of its arms (annotated pcall). We usually omit writing € 
in unannotated pcalls. 

5. m is à procedure name. 


'The top layer h of a layered concurrent program is the disappearing layer of m. 


Intuition Behind Layer Numbers. Recall that a layered concurrent program 
LP should represent a sequence of h+1 concurrent programs 71,--- ,P,44 that 
are connected by a sequence of h checker programs C,,--- , Cj (cf. Fig. 1). Before 
we provide formal definitions, let us get some intuition on two core mechanisms: 
global variable introduction and procedure abstraction/refinement. 

Let v be a global variable with layer range [a,b]. The meaning of this layer 
range is that the “first” program that contains v is Ca, the checker program 
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connecting Pa and P444. In particular, v is not yet part of Pa. In C, the intro- 
duction actions at layer a can modify v and thus assign its meaning in terms of 
all other available variables. Then v is part of P,,4 and all programs up to and 
including Py. The “last” program containing v is Cy. In other words, when going 
from a program P; to Pi+ı the variables with upper bound i disappear and the 
variables with lower bound i are introduced; the checker program C; has access 
to both and establishes their relationship. 

Let P be a procedure with disappearing layer n and refined atomic action 
A. 'The meaning of the disappearing layer is that P exists in all programs from 
Pı up to and including Pn. In P,,,1 and above every invocation of P is replaced 
by an invocation of A. To ensure that this replacement is sound, the checker 
program C, performs a refinement check that ensures that every execution of P 
behaves like A. Observe that the body of procedure P itself changes from P4 to 
Pn according to the disappearing layer of the procedures it calls. 

With the above intuition in mind it is clear that the layer annotations in a 
layered concurrent program cannot be arbitrary. For example, if procedure P 
calls a procedure Q, then Q cannot have a higher disappearing layer than P, for 
Q could introduce further behaviors into the program after P was replaced by 
A, and those behaviors are not captured by A. 


3.1 Type Checker 


We describe the constraints that need to be satisfied for a layered concurrent 
program to be well-formed. A full formalization as a type checker with top-level 
judgment | £P is given in Fig. 5. For completeness, the type checker includes 
standard constraints (e.g., variable scoping, parameter passing, etc.) that we are 
not going to discuss. 


(Atomic Action)/(Introduction Action). Global variables can only be 
accessed by atomic actions and introduction actions. For a global variable v 
with layer range [a,b], introduction actions with layer number a are allowed to 
modify v (for sound variable introduction), and atomic actions with a layer range 
contained in [a + 1,6] have access to v. Introduction actions must be nonblock- 
ing, which means that every state that satisfies the gate must have a possible 
transition to take. This ensures that introduction actions only assign meaning 
to introduced variables but do not exclude any program behavior. 


(If). Procedure bodies change from layer to layer because calls to procedures 
become calls to atomic actions. But the control-flow structure within a procedure 
is preserved across layers. Therefore (local) variables accessed in an if condition 
must be available on all layers to ensure that the if statement is well-defined on 
every layer. 


(Introduction Call). Let A be an introduction action with layer number n. 
Since A modifies global variables introduced at layer n, icalls to A are only 
allowed from procedures with disappearing layer n. Similarly, the formal output 
parameters of an icall to A must have introduction layer n. The icall is only 
preserved in Cy. 


(Program) 

dom(AS) n dom(IS) = Ø 
PS(m) = (-,-,-,-,h, -; Am) 
AS(Am) = C soar) 
h+ler 


V A € dom(AS):(GS,AS)FA 
V A € dom(IS) : (GS,IS) F A 
v P € dom(PS) : (AS, 1S, PS) - P 


+ (GS, AS, IS, PS, m, T) 


(Atomic action) 

AS(A) = (I,O,e,t,r) 

Disjoint(I, O) 

V v € ReadVars(e,t): ve IVr C GS (v) 
V v € WriteVars(t): ve O vr Cc GS(v) 


(GS, AS)- A 


(Introduction action) 

IS(A) = (1,O,e,t,n) 

Disjoint(I, O) 

V v € ReadVars(e,t): ve IV n € GS(v) 

V v € WriteVars(t) : v € OV GS(v) = [n, .] 
Nonblocking(e, t) 
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(Sequence) 
(AS,IS,PS),P- s, (AS,IS, PS), P - sz 


(AS,IS, PS, P- sı ; $2 


(If) 

PS(P) = (,-, L, -, -, ns, -) 

V x € ReadVars(e) : x € IU L ^ns(z) =0 
(AS, IS, PS), P- sı (AS, IS, PS), P H sa 


(AS, IS, PS), PF if e then s, else s2 


(Parameter passing) 
dom(t) = I’ dom(o) COUL 
img(t) C IUOUL  img(o) C O' 


ValidIO(t, 0, I, O, L, I' ,O") 


(Introduction call) 

PS(P) = (Ip, Op, Lp, ,np,nsp, .) 
IS(A) = (La, Oa, „t, na) 
ValidIO(, 0, Ip, Op, Lp, IA, Oa) 
na = "np 

V v € dom(o) : nsp(v) 2 np 


(GS,IS) - A 


(Procedure) 

PS(P) = (I,O,L,s,n,ns, A) 
AS(A) = (1,0, 5.) 
Disjoint(I, O, L) 
VuEIUOUL: ns(v) <n 
(AS, IS, PS), P s 


(AS, IS, PS) F P 


(Skip) 
(AS, IS, PS), P - skip 


r------------------ 
| 
| 


GS(v) = [a + 1,6] for GS(v) = [a, t] 
| ReadVars(e) = (v | 3o,a : e(o) Z e(o[v — a])) U 
ReadVars(t) = (v |360,6',a:(o,9) E tA (ofu al, oa’) Z t) 
| ReadVars(e,t) — ReadVars(e) U ReadVars(t) 
| WriteVars(t) = {v | 3 0,0’ :(o,0) E€ tA olv) £ e'(v)) 
Nonblocking(e,t) =Y o € e:3o6':(a,o0') €t 


| 
| 
[3 ie icEa Fee ora uice EB dica O pce SIE, dese ut Ge ee I SS 


(AS, IS, PS), P - icall (A, 0,0) 


(Parallel call) 
Vizj:dom(o;) dom(o;) = Ø 
dom(o;)  img(u) = Ø 

Vi: PS(P) = (Ip,Op, Lp, ,np,nsp,-) 
PS(Qi) = (i, Oi, -,-, ni, nsi, Ai) 
AS(Ai) = (55 557i) 
ValidIO(u;, 0i, Ip, Op, Lp, Ii, Oi) 
V v € dom(u) : nsp(ti(v)) € nsi(v) 
V v € dom(o;) : nsi(o;(v)) € nsp(v) 
n; X np [ni - 1,onp] Cri 
i=a = ni —np ^Op C dom(o;) 
iZo^n,-—mnp => dom(oi) C Li 

Jiini S Xni zm 


(AS, IS, PS), P - pcall, (Qi, vi, 0;) 


i€[1,k] 


Fig. 5. Type checking rules for layered concurrent programs 
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(Parallel Call). All arms in a pcall must be procedure arms invoking a proce- 
dure with a disappearing layer less than or equal to the disappearing layer of the 
caller. Furthermore, above the disappearing layer of the callee its refined atomic 
action must be available up to the disappearing layer of the caller. Parameter 
passing can only be well-defined if the actual inputs exist before the formal 
inputs, and the formal outputs exist before the actual outputs. The sequence of 
disappearing layers of the procedures in a pcall must be monotonically increas- 
ing and then decreasing, such that the resulting pcall in the extracted programs 
consists of procedure arms surrounded by atomic-action arms on every layer. 

Annotated pcalls are only used for invocations to procedures with the same 
disappearing layer n as the caller. In particular, during refinement checking in 
Cn only the arm with index a is allowed to modify the global state, which must 
be according to the refined atomic action of the caller. The remaining arms must 
leave the global state unchanged. 


3.2 Concurrent Program Extraction 


Let LP = (GS,AS,IS, PS,m,T) be a layered concurrent program such that 
PS(m) = (5 4 4 -, R, -, Am). We show how to extract the programs P1,:-- ,Ps44 
by defining a function I;(£P) such that P; = I;(CP) for every £ € [1, h + 1]. 
For a local variable layer mapping ns we define the set of local variables with 
layer number less then £ as ns|¢ = (v | ns(v) < £}. Now the extraction function 
Ij is defined as 


I;(CP) — (gs, as, ps,m’,T), 
where 


gs = {v | GS(v) = [a,b] ^£ € [a +1, BI}, 
as = {A+ (L,O,e,t) | AS(A) = U,O,e,t,r) ALE r}, 
ps = {P > (Inns|, ON ns|e, Ln nse, IF (s)) | PS(P) = (1,0, L, s,n, ns,-) AL < n}, 
hs a if £e [1, A] 
Am if€=h+1’ 


and the extraction of a statement in the body of procedure P is given by 


IT (skip) — skip, 

Dj (s1 ; 82) = I7 (s1) i TE (s2), 

If (if e then sı else s2) = if e then IF (sı) else IF (s2), 

If (icall (A, +, 0)) = skip, 

IT (peall, (Q, 1,0) 0)) — pcall (X,Ulnsgler Olnsple)» 

PS(P) = (5555s s nSP, -) Qifl<n 

f dX = TE 
9 PS(Q) = (55 n, nsg, A) B Aifl>n 


Thus 7, includes the global and local variables that were introduced before £ and 
the atomic actions with £ in their layer range. Furthermore, it does not contain 
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introduction actions and correspondingly all icall statements are removed. Every 
arm of a pcall statement, depending on the disappearing layer n of the called 
procedure Q, either remains a procedure arm to Q, or is replaced by an atomic- 
action arm to A, the atomic action refined by Q. The input and output mappings 
are restricted to the local variables at layer £. The set of initial stores of 7; is 
the same as for LP, since stores range over all program variables. 

In our programming language, loops are subsumed by the more general mech- 
anism of recursive procedure calls. Observe that 7, can indeed have recursive 
procedure calls, because our type checking rules (Fig.5) allow a pcall to invoke 
a procedure with the same disappearing layer as the caller. 


3.3 Running Example 


We return to our lock example from Sect.2.1. Figure6 shows its implementa- 
tion as the layered concurrent program LPPE, Layer annotations are indicated 
using an @ symbol. For example, the global variable b has layer range [0, 1], all 
occurrences of local variable tid have introduction layer 1, the atomic action 
ACQUIRE has layer range [2, 2], and the introduction action iSetLock has layer 
number 1. 

First, observe that LP! is well-formed, i.e., + £P'^**. Then it is an easy 
exercise to verify that Iy(£LP'°*) = Peck for ( € [1,3]. Let us focus on proce- 
dure Worker. In PPek (Fig. 3(a)) tid does not exist, and correspondingly Alloc, 
Enter, and Leave do not have input respectively output parameters. Further- 
more, the icall in the body of Alloc is replaced with skip. In PJ°** (Fig. 3(b)) 
we have tid and the calls to Alloc, Enter, and Leave are replaced with their 
respective refined atomic actions ALLOC, ACQUIRE, and RELEASE. The only anno- 
tated pcall in LP’? is the recursive call to Enter. 

In addition to representing the concurrent programs in Fig.3, the program 
LP'° also encodes the connection between them via introduction actions and 
calls. The introduction action iSetLock updates lock to maintain the relation- 
ship between lock and b, expressed by the predicate InvLock. It is called in 
Enter in case the CAS operation successfully set b to true, and in Leave when 
b is set to false. The introduction action iIncr implements linear thread identi- 
fiers using the integer variables pos which points to the next value that can be 
allocated. For every allocation, the current value of pos is returned as the new 
thread identifier and pos is incremented. 

The variable slots is introduced at layer 1 to represent the set of unallocated 
identifiers. It contains all integers no less than pos, an invariant that is expressed 
by the predicate InvAlloc and maintained by the code of ilncr. The purpose 
of slots is to encode linear allocation of thread identifiers in a way that the 
body of iIncr can be locally shown to preserve the disjointness invariant for 
linear variables; slots plays a similar role in the specification of the atomic 
action ALLOC in P2. The variable pos is both introduced and hidden at layer 1 
so that it exists neither in P}?** nor Pl^*. However, pos is present in the checker 
program C; that connects P}?* and Plock, 
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[phe 
var b@[0,1] : bool right ACQUIRE@[2,2](linear tid : int) 
var lock@[1,2] : int assert tid !- 0 
var posQ[1,1] : int assume lock -- 
var linear slots@[1,2] : set<int> lock := tid 
predicate InvLock left RELEASEQ[2,2] (linear tid : int) 
b <==> lock != 0 assert tid != 0 && lock == tid 
lock := 0 
predicate InvAlloc 
pos > 0 && slots == [pos,oo) proc Enter@l(linear tid@l : int) 
refines ACQUIRE 
init InvLock && InvAlloc var successQ0 : bool 
pcall success :- Cas() 
both SKIP@3 () if (success) 
skip icall iSetLock(tid) 
else 
proc Main@2() pcall, Enter(tid) 
refines SKIP 
if (*) proc Leave@l(linear tid@l : int) 
pcall Worker(), Main() refines RELEASE 
pcall Reset() 
proc Worker@2() icall iSetLock(0) 
refines SKIP 
var linear tid@l : int iaction iSetLock@l(v : int) 
pcall tid := Alloc() lock := v 
pcall Enter(tid) 
pcall Leave(tid) atomic CASQ[1,1]() : (success : bool) 
if (b) success :- false 
right ALLOC@[2,2]() : (linear tid : int) else success, b :- true, true 
assume tid != 0 && tid € slots 
slots := slots - tid atomic RESETQ[1,1]() 
assert b 
proc Alloc@l() : (linear tid@l : int) b :- false 
refines ALLOC 
icall tid := iIncr() proc Cas@0() : (successQO : bool) 
refines CAS 
iaction iIncr@l() : (linear tid : int) 
assert InvAlloc proc Reset@0() 
tid :- pos refines RESET 
pos := pos +1 
slots :- slots - tid 


Fig. 6. Lock example (layered concurrent program) 


'The bodies of procedures Cas and Reset are not shown in Fig.6 because 
they are not needed. They disappear at layer 0 and are replaced by the atomic 
actions CAS and RESET, respectively, in Pl^**, 

The degree of compactness afforded by layered programs (as in Fig. 6) over 
separate specification of each concurrent program (as in Fig. 3) increases rapidly 
with the size of the program and the maximum depth of procedure calls. In our 
experience, for realistic programs such as a concurrent garbage collector [7] or a 
data-race detector [15], the saving in code duplication is significant. 


4 Refinement Checking 


Section 3 described how a layered concurrent program LP encodes a sequence 
PA, ..., Ph, Pn 44 of concurrent programs. In this section, we show how the safety 
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of any concurrent program in the sequence is implied by the safety of its suc- 
cessor, ultimately allowing the safety of Pı to be established by the safety of 
Ph4i- 

There are three ingredients to connecting Pe to Pe41 for any £ € [1,h]— 
reduction, projection, and abstraction. Reduction allows us to conclude the 
safety of a concurrent program under preemptive semantics by proving safety 
only under cooperative semantics. 


Theorem 1 (Reduction). Let P be a concurrent program. If MSafe(P) and 
CSafe(P), then Safe(P). 


The judgment MSafe(P) uses logical commutativity reasoning and mover types 
to ensure that cooperative safety is sufficient for preemptive safety (Sect. 4.1). 
We use this theorem to justify reasoning about CSafe(Pe) rather than Safe(P,). 

The next step in connecting Pe to P,44, is to introduce computation intro- 
duced at layer £ into the cooperative semantics of Py. This computation com- 
prises global and local variables together with introduction actions and calls to 
them. We refer to the resulting program at layer £ as Py. 


Theorem 2 (Projection). Let LP be a layered concurrent program with top 
layer h and £ € [1, h]. If CSafe(P4), then CSafe(P,). 


Since introduction actions are nonblocking and T, is safe under cooperative 
semantics, every cooperative execution of Pz can be obtained by projecting away 
the computation introduced at layer £. This observation allows us to conclude 
that every cooperative execution of 7, is also safe. 7 

Finally, we check that the safety of the cooperative semantics of P, is ensured 
by the safety of the preemptive semantics of the next concurrent program 7,4. 
This connection is established by reasoning about the cooperative semantics of 
a concurrent checker program C; that is automatically constructed from LP. 


Theorem 3 (Abstraction). Let £P be a layered concurrent program with top 
layer h and £E 1, h]. If CSafe(C;) and Safe(Pe+1), then CSafe(Pc). 


'The checker program C; is obtained by instrumenting the code of P, with extra 
variables and procedures that enable checking that procedures disappearing at 
layer £ refine their atomic action specifications (Sect. 4.2). 

Our refinement check between two consecutive layers is summarized by the 
following corollary of Theorems 1-3. 


Corollary 1. Let LP be a layered concurrent program with top layer h and 
LE [1, h]. If MSafe(P,), CSafe(Ce) and Safe(P,,1), then Safe(P,). 


The soundness of our refinement checking methodology for layered concurrent 
programs is obtained by repeated application of Corollary 1. 


Corollary 2. Let LP be a layered concurrent program with top layer h. If 
MSafe(P,) and CSafe(Ce) for all L€ 1, h] and Safe(Pni1), then Safe(P1). 
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4.1 From Preemptive to Cooperative Semantics 


We present the judgment MSafe(P) that allows us to reason about a concur- 
rent program P under cooperative semantics instead of preemptive semantics. 
Intuitively, we want to use the commutativity of individual atomic actions to 
rearrange the steps of any execution under preemptive semantics in such a way 
that it corresponds to an execution under cooperative semantics. We consider 
mappings M € Action — {N,R,L,B} that assign mover types to atomic actions; 
N for non-mover, R for right mover, L for left mover, and B for both mover. The 
judgment MSafe(P) requires a mapping M that satisfies two conditions. 

First, the atomic actions in P must satisfy the following logical commutativity 
conditions [7], which can be discharged by a theorem prover. 


— Commutativity: If A, is a right mover or Ag is a left mover, then the effect 
of A; followed by A» can also be achieved by Ag followed by A. 

— Forward preservation: If A, is a right mover or Ag is a left mover, then the 
failure of Ao after A; implies that A must also fail before A. 

— Backward preservation: If Az is a left mover (and A, is an arbitrary), then 
the failure of A, before Ag implies that A, must also fail after Ag. 

— Nonblocking: If A is a left mover, then A cannot block. 


Second, the sequence of atomic actions in preemptive executions of P must 
be such that the desired rearrangement into cooperative executions is possible. 
Given a preemptive execution, consider, for each 
thread individually, a labeling of execution steps 
where atomic action steps are labeled with their 
mover type and procedure calls and returns are 
labeled with Y (for yield). The nondeterministic 
atomicity automaton .A on the right defines all 
allowed sequences. Intuitively, when we map the 
execution steps of a thread to a run in the automaton, the state RM denotes 
that we are in the right mover phase in which we can stay until the occurrence 
of a non-right mover (L or N). Then we can stay in the left mover phase (state 
LM) by executing left movers, until a preemption point (Y) takes us back to 
RM. Let £ be the mapping from edge labels to the set of edges that contain the 
label, e.g., €(R) = (RM — RM, RM — LM}. Thus we have a representation of 
mover types as sets of edges in A, and we define €(A) = £(M(A)). Notice that 
the set representation is closed under relation composition o and intersection, 
and behaves as expected, e.g., E(R) o £(L) = E(N). 

Now we define an intraprocedural control flow analysis that lifts £ to a map- 
ping € on statements. Intuitively, r — y € Ê (s) means that every execution 
of the statement s has a run in .A from z to y. Our analysis does not have to 
be interprocedural, since procedure calls and returns are labeled with Y, allow- 
ing every possible state transition in A. MSafe(P) requires E (s) Z Ø for every 
procedure body s in P, where E is defined as follows: 
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E(skip) = E(B) &(s1 ; s2) = £(s1) 0 E(s2) Elif e then sı else s2) = E(s1) N E(s2) 
" cecus. DEMONS if P=e 
E(peall A1P A3) x E(L) 0 €*(A1) o£(Y)o£*(A2) o £(R) if P e 


Skip is a both mover, sequencing composes edges, and if takes the edges 
possible in both branches. In the arms of a pcall we omit writing the input and 
output maps because they are irrelevant to the analysis. Let us first focus on 
the case P = & with no procedure arms. In the preemptive semantics all arms 
are arbitrarily interleaved and correspondingly we define the function 


£ (AA) = (| E(Aray) o £0) 


TEDi 


to consider all possible permutations (7 ranges over the symmetric group Sn) 
and take the edges possible in all permutations. Observe that £* evaluates to 
non-empty in exactly four cases: E(N) for {B}*N{B}*, E(B) for {B}*, £(R) for 
{R, B}*\{B}*, and £(L) for {L, B}*\ {B}*. These are the mover-type sequences 
for which an arbitrary permutation (coming from a preemptive execution) can 
be rearranged to the order given by the pcall (corresponding to cooperative 
execution). 

In the case P Æ e there is a preemption point under cooperative semantics 
between A; and Ag, the actions in A; are executed in order before the preemp- 
tion, and the actions in Ag are executed in order after the preemption. To ensure 
that the cooperative execution can simulate an arbitrarily interleaved preemp- 
tive execution of the pcall, we must be able to move actions in A, to the left and 
actions in A, to the right of the preemption point. We enforce this condition by 
requiring that Aj is all left (or both) movers and Ag all right (or both) movers, 
expressed by the leading £(L) and trailing £(R) in the edge composition. 


4.2 Refinement Checker Programs 


In this section, we describe the construction of checker programs that justify the 
formal connection between successive concurrent programs in a layered concur- 
rent program. The description is done by example. In particular, we show the 
checker program C!°** that establishes the connection between P1^** and Po 
(Fig. 3) of our running example. 


Overview. Cooperative semantics splits any execution of PÍ^** into a sequence 
of preemption-free execution fragments separated by preemptions. Verification 
of Clee’ must ensure that for all such executions, the set of procedures that 
disappear at layer 1 behave like their atomic action specifications. That is, the 
procedures Enter and Leave must behave like their specifications ACQUIRE and 
RELEASE, respectively. It is important to note that this goal of checking refine- 
ment is easier than verifying that P!/°“ is safe. Refinement checking may succeed 
even though 7l*** fails; the guarantee of refinement is that such a failure can be 
simulated by a failure in PÍ**. The construction of C1^** can be understood in 
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two steps. First, the program P!°** shown in Fig. 7 extends Pe% (Fig. 3(a)) with 
the variables introduced at layer 1 (globals lock, pos, slots and locals tid) and 
the corresponding introduction actions (iIncr and iSetLock). Second, Cl*^* is 
obtained from PL by instrumenting the procedures to encode the refinement 
check, described in the remainder of this section. 


Plock 
var b : bool proc Enter(linear tid : int) 
var lock : int var success : bool 
var pos : int pcall success :- CAS() 
var linear slots : set<int> if (success) 
icall iSetLock(tid) 

proc Main() else 

if (*) pcall Enter(tid) 


pcall Worker(), Main() 
proc Leave(linear tid : int) 


proc Worker() pcall RESET() 

var linear tid : int icall iSetLock(0) 

pcall tid :- Alloc() 

pcall Enter(tid) iaction iSetLock(v : int) 

pcall Leave(tid) lock := v 
proc Alloc() : (linear tid : int) atomic CAS() : (success : bool) 

icall tid := ilncr() if (b) success :- false 

else Success, b :- true, true 

iaction ilncr() : (tid : int) 

assert InvAlloc atomic RESET() 

tid := pos assert b 

pos := pos +1 b := false 


slots := slots - tid 


Fig. 7. Lock example (variable introduction at layer 1) 


Context for Refinement. There are two kinds of procedures, those that con- 
tinue to exist at layer 2 (such as Main and Worker) and those that disappear at 
layer 1 (such as Enter and Leave). Cl*** does not need to verify anything about 
the first kind. These procedures only provide the context for refinement checking 
and thus all invocation of an atomic action (J, O, e, t) in any atomic-action arm of 
a pcall is converted into the invocation of a fresh atomic action (I, O, true, e^t). 
In other words, the assertions in procedures that continue to exist at layer 2 
are converted into assumptions for the refinement checking at layer 1; these 
assertions are verified during the refinement checking on a higher layer. In our 
example, Main and Worker do not have atomic-action arms, although this is 
possible in general. 


Refinement Instrumentation. We illustrate the instrumentation of proce- 
dures Enter and Leave in Fig. 8. The core idea is to track updates by preemption- 
free execution fragments to the shared variables that continue to exist at layer 2. 
There are two such variables—lock and slots. We capture snapshots of lock 
and slots in the local variables , lock and _slots and use these snapshots 
to check that the updates to lock and slots behave according to the refined 
atomic action. In general, any path from the start to the end of the body of a 
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Clock 

1 

macro *CHANGED* is !(lock == lock && slots == slots) 

macro *RELEASE* is lock == 0 && slots == slots 

macro *ACQUIRE* is lock == 0 && lock == tid && slots == slots 

proc Leave(linear tid) # Leave must behave like RELEASE 
var lock, slots, pc, done 
pc, done :- false, false lize pc and 


2 snapshot 


lock, slots :- lock, slots 
assume pc || (tid != 0 && lock == tid) # assume gate of R 


pcall RESET() 
icall iSetLock(0) 


assert *CHANGED* ==> (!pc && *RELEASE*) 
pc := pc || *CHANGED* 
done :- done || *RELEASE* 


assert done 


proc Enter(linear tid) # Enter must behave lil 
var success, lock, slots, pc, done 
pc, done :- false, false 
lock, slots := lock, slots 
assume pc || tid != 0 # 


val variable 


pcall success := CAS() 
if (success) 
icall iSetLock(tid) 
else 
assert *CHANGED* ==> (!pc && *ACQUIRE*) 
pc := pc || *CHANGED* # trac 
done := done || *ACQUIRE* # track if ACQUIRE ha 


ed 
if (*) # then: check refinement of caller 
pcall pc := Check Enter Enter(tid, # heck annotated procedure arn 
tid, pc) (defined below) 


rat ACQU 
f callee 


done := true A ibove 
else L 
pcall Enter(tid) 3 
assume false oot 


lock, slots :- lock, slots # take snapshot of global variable 
assume pc || tid != 0 # assume gate of ACQUIRE 


assert *CHANGED* ==> (!pc && *ACQUIRE*) # state 
pc := pc || *CHANGED* # track 
done :- done || *ACQUIRE* # track 


rst and like ACQUIRE 


assert done # check that ACQUIRE happened 


proc Check Enter Enter(tid, x, pc) : (pc') 4 check annotated pcall from Enter to Enter 
var lock, slots 
lock, slots :- lock, slots # take sna] jf global variable 
assume pc || tid != 0 # assume gate of ACQUIRE 


pcall ACQUIRE(x) # use ACQUIRE to ''simulate'' call to Enter 


assert *ACQUIRE* 
assert *CHANGED* ==> !pc 
pc’ := pc || *CHANGED* 


Fig. 8. Instrumented procedures Enter and Leave (layer 1 checker program) 
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procedure may comprise many preemption-free execution fragments. The checker 
program must ensure that exactly one of these fragments behaves like the speci- 
fied atomic action; all other fragments must leave lock and slot unchanged. To 
track whether the atomic action has already happened, we use two local Boolean 
variables—pc and done. Both variables are initialized to false, get updated to 
true during the execution, and remain at true thereafter. The variable pc is set 
to true at the end of the first preemption-free execution fragment that mod- 
ifies the tracked state, which is expressed by the macro *CHANGED* on line 1. 
The variable done is set to true at the end of the first preemption-free execu- 
tion fragment that behaves like the refined atomic action. For that, the macros 
*RELEASE* and *ACQUIRE* on lines 2 and 3 express the transition relations of 
RELEASE and ACQUIRE, respectively. Observe that we have the invariant pc == 
done. The reason we need both pc and done is to handle the case where the 
refined atomic action may stutter (i.e., leave the state unchanged). 


Instrumenting Leave. We first look at the instrumentation of Leave. Line 8 
initializes the snapshot variables. Recall that a preemption inside the code of 
a procedure is introduced only at a pcall containing a procedure arm. Conse- 
quently, the body of Leave is preemption-free and we need to check refinement 
across a single execution fragment. This checking is done by lines 14-16. The 
assertion on line 14 checks that if any tracked variable has changed since the last 
snapshot, (1) such a change happens for the first time (!pc), and (2) the current 
value is related to the snapshot value according to the specification of RELEASE. 
Line 15 updates pc to track whether any change to the tracked variables has 
happened so far. Line 16 updates done to track whether RELEASE has happened 
so far. The assertion at line 18 checks that RELEASE has indeed happened before 
Leave returns. The assumption at line 9 blocks those executions which can be 
simulated by the failure of RELEASE. It achieves this effect by assuming the gate 
of RELEASE in states where pc is still false (1.e., RELEASE has not yet happened). 
The assumption yields the constraint lock !- 0 which together with the invari- 
ant InvLock (Fig.6) proves that the gate of RESET does not fail. 

'The verification of Leave illustrates an important principle of our approach 
to refinement. The gates of atomic actions invoked by a procedure P disap- 
pearing at layer @ are verified using a combination of invariants established on 
C, and pending assertions at layer l + 1 encoded as the gate of the atomic 
action refined by P. For Leave specifically, assert b in RESET is propagated to 
assert tid != nil && lock == tid in RELEASE. The latter assertion is veri- 
fied in the checker program C} when Worker, the caller of RELEASE, is shown 
to refine the action SKIP which is guaranteed not to fail since its gate is true. 


Instrumenting Enter. The most sophisticated feature in a concurrent pro- 
gram is a pcall. The instrumentation of Leave explains the instrumentation of 
the simplest kind of pcall with only atomic-action arms. We now illustrate the 
instrumentation of a pcall containing a procedure arm using the procedure Enter 
which refines the atomic action ACQUIRE and contains a pcall to Enter itself. The 
instrumentation of this pcall is contained in lines 30-43. 
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A pcall with a procedure arm is challenging for two reasons. First, the callee 
disappears at the same layer as the caller so the checker program must reason 
about refinement for both the caller and the callee. This challenge is addressed 
by the code in lines 34-40. At line 34, we introduce a nondeterministic choice 
between two code paths—then branch to check refinement of the caller and else 
branch to check refinement of the callee. An explanation for this nondeterministic 
choice is given in the next two paragraphs. Second, a pcall with a procedure arm 
introduces a preemption creating multiple preemption-free execution fragments. 
'This challenge is addressed by two pieces of code. First, we check that lock 
and slots are updated correctly (lines 30-32) by the preemption-free execution 
fragment ending before the pcall. Second, we update the snapshot variables 
(line 42) to enable the verification of the preemption-free execution fragment 
beginning after the pcall. 

Lines 35-37 in the then branch check refinement against the atomic action 
specification of the caller, exploiting the atomic action specification of the callee. 
'The actual verification is performed in a fresh procedure Check Enter Enter 
invoked on line 35. Notice that this procedure depends on both the caller and 
the callee (indicated in colors), and that it preserves a necessary preemption 
point. The procedure has input parameters tid to receive the input of the caller 
(for refinement checking) and x to receive the input of the callee (to generate the 
behavior of the callee). Furthermore, pc may be updated in Check, Enter Enter 
and thus passed as both an input and output parameter. In the body of the 
procedure, the invocation of action ACQUIRE on line 56 overapproximates the 
behavior of the callee. In the layered concurrent program (Fig.6), the (recur- 
sive) pcall to Enter in the body of Enter is annotated with 1. This annotation 
indicates that for any execution passing through this pcall, ACQUIRE is deemed 
to occur during the execution of its unique arm. This is reflected in the checker 
program by updating done to true on line 37; the update is justified because of 
the assertion in Check, Enter, Enter at line 58. If the pcall being translated was 
instead unannotated, line 37 would be omitted. 

Lines 39-40 in the else branch ensure that using the atomic action speci- 
fication of the callee on line 56 is justified. Allowing the execution to continue 
to the callee ensures that the called procedure is invoked in all states allowed 
by Pı. However, the execution is blocked once the call returns to ensure that 
downstream code sees the side-effect on pc and the snapshot variables. 

To summarize, the crux of our instrumentation of procedure arms is to com- 
bine refinement checking of caller and callee. We explore the behaviors of the 
callee to check its refinement. At the same time, we exploit the atomic action 
specification of the callee to check refinement of the caller. 


Instrumenting Unannotated Procedure Arms. Procedure Enter illus- 
trates the instrumentation of an annotated procedure arm. The instrumentation 
of an unannotated procedure arm (both in an annotated or unannotated pcall) 
is simpler, because we only need to check that the tracked state is not modified. 
For such an arm to a procedure refining atomic action Action, we introduce a 
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procedure Check, Action (which is independent of the caller) comprising three 
instructions: take snapshots, pcall A, and assert !*CHANGED*. 


Pcalls with Multiple Arms. Our examples show the instrumentation of pcalls 
with a single arm. Handling multiple arms is straightforward, since each arm is 
translated independently. Atomic action arms stay unmodified, annotated pro- 
cedure arms are replaced with the corresponding Check Caller Callee pro- 
cedure, and unannotated procedure arms are replaced with the corresponding 
Check Action procedure. 


Output Parameters. Our examples illustrate refinement checking for atomic 
actions that have no output parameters. In general, a procedure and its atomic 
action specification may return values in output parameters. We handle this 
generalization but lack of space does not allow us to present the technical details. 


5 Conclusion 


In this paper, we presented layered concurrent programs, a programming nota- 
tion to succinctly capture a multi-layered refinement proof capable of connect- 
ing a deeply-detailed implementation to a highly-abstract specification. We pre- 
sented an algorithm to extract from the concurrent layered program the indi- 
vidual concurrent programs, from the most concrete to the most abstract. We 
also presented an algorithm to extract a collection of refinement checker pro- 
grams that establish the connection among the sequence of concurrent pro- 
grams encoded by the layered concurrent program. The cooperative safety of 
the checker programs and the preemptive safety of the most abstract concurrent 
program suffices to prove the preemptive safety of the most concrete concurrent 
program. 

Layered programs have been implemented in CIVL, a deductive verifier for 
concurrent programs, implemented as a conservative extension to the Boogie ver- 
ifier [3]. CIvL has been used to verify a complex concurrent garbage collector [6] 
and a state-of-the-art data-race detection algorithm [15]. In addition to these 
two large benchmarks, around fifty smaller programs (including a ticket lock 
and a lock-free stack) are available at https:/ /github.com/boogie-org/boogie. 

There are several directions for future work. We did not discuss how to verify 
an individual checker program. CIVL uses the Owicki-Gries method [13] and rely- 
guarantee reasoning [8] to verify checker programs. But researchers are exploring 
many different techniques for verification of concurrent programs. It would be 
interesting to investigate whether heterogeneous techniques could be brought to 
bear on checker programs at different layers. 

In this paper, we focused exclusively on verification and did not discuss code 
generation, an essential aspect of any programming system targeting the con- 
struction of verified programs. There is a lot of work to be done in connecting 
the most concrete program in a concurrent layered program to executable code. 
Most likely, different execution platforms will impose different obligations on 
the most concrete program and the general idea of layered concurrent programs 
would be specialized for different target platforms. 
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Scalable verification is a challenge as the size of programs being verified 
increases. Traditionally, scalability has been addressed using modular verifica- 
tion techniques but only for single-layer programs. It would be interesting to 
explore modularity techniques for concurrent layered programs in the context of 
a refinement-oriented proof system. 

Layered concurrent programs bring new challenges and opportunities to the 
design of programming languages and development environments. Integrating 
layers into a programming language requires intuitive syntax to specify layer 
information and atomic actions. For example, ordered layer names can be more 
readable and easier to refactor than layer numbers. An integrated development 
environment could provide different views of the layered concurrent program. For 
example, it could show the concurrent program, the checker program, and the 
introduced code at a particular layer. Any updates made in these views should 
be automatically reflected back into the layered concurrent program. 
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Abstract. We present an extension of propositional dynamic logic 
called HOT-PDL for specifying temporal properties of higher-order func- 
tional programs. The semantics of HOT-PDL is defined over Higher- 
Order Traces (HOTs) that model execution traces of higher-order pro- 
grams. A HOT is a sequence of events such as function calls and returns, 
equipped with two kinds of pointers inspired by the notion of justifica- 
tion pointers from game semantics: one for capturing the correspondence 
between call and return events, and the other for capturing higher-order 
control flow involving a function that is passed to or returned by a higher- 
order function. To allow traversal of the new kinds of pointers, HOT- 
PDL extends PDL with new path expressions. The extension enables 
HOT-PDL to specify interesting properties of higher-order programs, 
including stack-based access control properties and those definable using 
dependent refinement types. We show that HOT-PDL model checking of 
higher-order functional programs over bounded integers is decidable via 
a reduction to modal -calculus model checking of higher-order recursion 
schemes. 


1 Introduction 


Temporal verification of higher-order programs has been an emerging research 
topic [12,14,18,22-24,26,27,31,34]. The specification languages used there are 
(w-)regular word languages (that subsume LTL) [12, 18,26] and modal p-calculus 
(that subsumes CTL) [14, 24,31], which are interpreted over sequences or trees 
consisting of events. (Extended) dependent refinement types are also used to 
specify temporal [23,27] and branching properties [34]. These specification lan- 
guages, however, cannot sufficiently express specifications of control flow involv- 
ing (higher-order) functions. For example, let us consider the following simple 
higher-order program D, (in OCaml syntax): 

let tw f x =f (f x) in let inc x = x + 1 in let r = * in tw inc r 
Here, * denotes a non-deterministic integer, and the higher-order function tw : 
(int — int) — int — int applies its function argument f : int — int to the 


integer argument x twice. For example, for r = 0, the program D, exhibits the 
following call-by-value reduction sequence (with the redexes underlined). 


tw inc 0 —> (Az.inc (inc x)) 0 — inc (inc 0) —>* inc 1 —* 2 


© The Author(s) 2018 
H. Chockler and G. Weissenbacher (Eds.): CAV 2018, LNCS 10981, pp. 105-123, 2018. 
https://doi.org/10.1007/978-3-319-96145-3 6 
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Example properties of the program D,, that cannot be expressed by the previous 
specification languages are: 


Prop.1. If the function returned by a partial application of tw to some function 
(e.g., Ax.inc (inc x) in the above sequence) is called with some integer n, 
the function argument passed to tw (i.e., inc) is eventually called with n. 

Prop.2. If the function returned by a partial application of tw to some function 
is never called, then the function argument passed to tw is never called. 


To remedy the limitation, we introduce a notion of Higher-Order Trace 
(HOT) that captures the control flow of higher-order programs and propose 
a dynamic logic over HOTs called Higher-Order Trace Propositional Dynamic 
Logic (HOT-PDL) for specifying temporal properties of higher-order programs. 

Intuitively, a HOT models a program execution trace which is a possibly 
infinite sequence of events such as function calls and returns with information 
about actual arguments and return values. Furthermore, HOTs are equipped 
with two kinds of pointers to enable precise specification of control flow: one 
for capturing the correspondence between call and return events, and the other 
for capturing higher-order control flow involving a function that is passed to or 
returned by a higher-order function. The two kinds of pointers are inspired by 
the notion of justification pointers from the game semantics of PCF [1,2,19,20]. 

For the higher-order program D,,, for r = 0, we get the following HOT G4,:! 


cc 


CR 


RC CR CR 
call (tw, e) ret (tw, e) call (e,0)call (e ,0)---ret (e, 1)call (e, 1): ret (e,2) ret(e, 2) 


Here, e represents some function value, call(f, v) represents a call event of the 
function f with the argument v, and ret(f,v) represents a return event of the 
function f with the return value v. This trace corresponds to the previous reduc- 
tion sequence: the call events call(tw, e), call(e, 0), call(e, 0), and call(e, 1) that 
occur in the trace in this order correspond respectively to the redexes tw inc, 
(Ax.inc (inc x)) 0, inc 0, and inc 1. The three important points here are that 
(1) the call events have pointers labeled with CR to the corresponding return 
events ret(tw, e), ret(e, 2), ret(e, 1), and ret(e, 2), (2) the call event call(tw, e) 
has two pointers labeled with CC, where e represents the function argument f 
of tw and the pointed call events call(e,0) and call(e, 1) represent the two calls 
to f in tw, and (3) the return event ret(tw,e) has a pointer labeled with RC, 
where e represents the partially-applied function Az.inc (inc x) and the pointed 
call event call(e, 0) represents the call to the function. 

'To allow traversal of the pointers, HOT-PDL extends propositional dynamic 
logic with new path expressions (see Sect. 3 for details). The extension enables 


1 The symbol --- indicates the omission of a subsequence. The two omitted subse- 
CR CR 


P REEL 
quences are call(inc,0) ret(inc,1) and call(inc,1) ret(inc,2) in this order. 
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HOT-PDL to specify interesting properties of higher-order programs, includ- 
ing stack-based access control properties and those definable using dependent 
refinement types. Here, stack-based access control is a security mechanism imple- 
mented in runtimes like JVM for ensuring secure execution of programs that 
have components with different levels of trust: the mechanism ensures that a 
security-critical function (e.g., file access) is invoked only if all the (immediate 
and indirect) callers in the current call stack are trusted, or one of the callers is 
a privileged function and its callees are all trusted. We introduce a new variant 
of stack-based access control properties for higher-order programs, formalized in 
HOT-PDL from the point of view of interactions among callers and callees. 

Compared to the previous specification languages with respect to the expres- 
siveness, HOT-PDL subsumes (w-)regular languages because PDL interpreted 
over words is already as expressive as them [15]. Temporal logics over nested 
words [6] such as CaRet [5] and NWTL [4] can capture the correspondence 
between call and return events (i.e., pointers labeled with CR) but cannot 
capture higher-order control flow (i.e., pointers labeled with CC and RC). 
Branching properties (expressible in, e.g., CTL), however, are out of the scope 
of the present paper, and such an extension of HOT-PDL remains an inter- 
esting future direction. Dependent refinement types are often used to specify 
properties of higher-order programs for partial- and total-correctness verifica- 
tion [29,33,39,40]. For example, the following properties of the program D«, are 
expressible: 


Prop.3. The function yielded by applying tw to a strictly increasing function is 
strictly increasing. 

Prop.4. The function yielded by applying tw to a terminating function is termi- 
nating. 


'This paper shows that HOT-PDL can encode such dependent refinement types. 

We also study HOT-PDL model checking: given a higher-order program D 
over bounded integers and a HOT-PDL formula ¢, the problem is to decide 
whether ¢ is satisfied by all the execution traces of D modeled as HOTs. We 
show the decidability of HOT-PDL model checking via a reduction to modal 
-calculus model checking of higher-order recursion schemes [21,28]. 

The rest of the paper is organized as follows. Section2 formalizes HOTs 
and explains how to use them to model execution traces of higher-order func- 
tional programs. Section 3 defines the syntax and the semantics of HOT-PDL and 
Sect. 4 shows how to encode stack-based access control properties and dependent 
refinement types in HOT-PDL. Section 5 discusses HOT-PDL model checking. 
We compare HOT-PDL with related work in Sect. 6 and conclude the paper with 
remarks on future work in Sect. 7. Omitted proofs are given in the extended ver- 
sion of this paper [30]. 


2 Higher-Order Traces 
This section defines the notion of Higher-Order Trace (HOT), which is used to 


model execution traces of higher-order programs. To this end, we first define 
(E, D)-labeled directed graphs and DAGs. 
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Definition 1 ((X, I)-labeled directed graphs). Let X be a finite set of node 
labels and I' be a finite set of edge labels. A (X, I)-labeled directed graph is 
defined as a triple (V, A, v), where V is a countable set of nodes, A: V —^ X isa 
node labeling function, and v : V x V — 2° is an edge labeling function. We call 
a (X, D)-labeled directed graph that has no directed cycle (X, I)-labeled DAG. 


Note that an edge may have multiple labels. For nodes u, u’ € V, v(u, u^) = 0 
means that there is no edge from u to u’. We use c and y as meta-variables 
ranging respectively over X and I’. We write Vs for the set {u € V | e = A(u)) 
of all the nodes labeled with c. We also write Vs for the set Uee s Vo. For 
u,u' € V, we write u <, u' if y € v(u,u'). A binary relation < (resp. 47) 
denotes the transitive (resp. reflexive and transitive) closure of <4. 


Definition 2 (HOTs). A HOT isa (X,I)-DAG, G = (V, A,v) that satisfies: 


1. V #0, T = {N,CR, CC, RC}, X = Xoan Y Sret, and Sean = XE n Y Vón 
2. XERE (Vz.u X Vx), <coS (Vra X Vsa)» and 4nc€ (Vz X Vya) 
3. The elements of V are linearly ordered by <N 
4. If u Acn wu’ and u «cmn u”, then u' = u”. 
5. For all u € Vsa, there uniquely exists u € Vy, such that u «cg u’ holds. 
6. For all w € Vsa y there uniquely exists u € V such that u «oc w or 
u «nc wu’ holds. 
Intuitively, Mean (resp. Me) represents a set of call (resp. return) events. 
XZ (resp. MÁg) represents a set of call events of top-level functions (resp. 


functions that are returned by or passed to (higher-order) functions). u <yn u’ 
means that u’ is the next event of u in the trace. u «cn. u’ indicates that u’ is 
the return event corresponding to the call event u. u «cc wu’ represents that u’ is 
a call event of the function argument passed at the call event u. u «gc u’ means 
that u’ is a call event of the partially-applied function returned at the return 
event u. We call the minimum node of a HOT G with respect to <n the root 
node, denoted by 0G. For HOTs G; and Gs, we say G is a prefix of Gz and write 
G4 3 Go, if G4 is asub-graph of Go such that 0G, = 0g,. Note that the HOT Gi 
in Sect. 1, where N-labeled edges are omitted, satisfies the above conditions, with 
Ícall(tw,e),call(inc,0),call(inc,1)) C X7, {call(e,0),call(e,1)} C X2 


call? call: 


and (ret(tw, e), ret(inc,1),ret(inc,2), ret(e, 1), ret(e,2)) C Xa. 


2.1 Trace Semantics for Higher-Order Functional Programs 


We now formalize our target language £, which is an ML-like typed call-by-value 
higher-order functional language. The syntax is defined by 


(programs) D ::= {fis Axz.e1,..., fm 9 Amen) 


(expressions) e::= zx | f | Av.e| e1 e2 | n | op(ei,e2) | ifz e1 ea ea 
(values) v:—f|Axe|n 
(types) 7 ::= int | T1 > T2 


Here, x and f are meta-variables ranging respectively over term variables and 
names of top-level functions. The meta-variable n ranges over the set of bounded 
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integers Zp = {Nmin,‘*: ; Nmax} C Z. For simplicity of presentation, £ has the 
type int of bounded integers as the only base type. op represents binary opera- 
tors such as +, —, x, =, and >. The binary relations = and > return an integer 
that encodes a boolean value (e.g., 1 for true and 0 for false). A program D 
maps each top-level function name f; to its definition Az.e;. We write dom(D) 
for (fi,..., fm}. We assume that D has the main function main of the type 
int — int. The functions in D can be mutually recursive. Expressions e com- 
prise variables x, function names f, lambda abstractions Ax.e, function appli- 
cations e1 e2, bounded integers n, binary operations op(v1, v2), and conditional 
branches ifz ei e2 e3. We assume that expressions are simply-typed. As usual, 
the simple type system guarantees that an evaluation of a typed expression never 
causes a runtime type mismatch like 1 + Ar.r. An expression ifz e; e2 e3 evalu- 
ates to e» (resp. e3) if e; evaluates to 0 (resp. a non-zero integer). For example, 
the program D,, in Sect. 1 is defined in £ as follows: 


Dey = {twro Af.Az.f (f x), inc e Ava + 1, main > Ar.tw inc r} 


Domains 
(configurations) C ::= (I, E[e]) 
(eval. contexts) E :- []| Ee|v E] op(E, e) | op(v, E) | i£z E ei e» | ret(h, i, E) 
(interfaces) I :— [hs MN v1,..., hm 3 Um 
(handles) A se r | f | Lh), | TT, 
(events) o ::= call(hi,i,h2) | ret(hi, i, ha) 


Derivation Rules 


(I, E[((Ax.e) v]) & (1, E[[v/z]e]) (APP (h Solel a = call(h,i,n) 


n = [op](n1, n2) 


(OP 
(I, Elop(ni, n3)]) > (I, Eln]) 
(I, E[ifz 0 ei €2]) 5 (I, E[e1]) (IFZ 
0 
iis : (IFN 
(I, E[itz n e e2]) ^ (I, Elea]) 
CSc (REFL 
c=c" oc 
—s (TRAN 
C LC 
Cc cA. 
(TRANW) 


C221 


Fig. 1. Labeled transition relations (3) and ( 


T =1{h* v} 


(I, E[h n]) S (T', E[ret(h, i, v n)]) 


(CINT) 


v is a function 


I a = call(h, i, |h;) 
{nv v 


^ Ah]; vl 
(I, E[h v]) & (^, E[ret(h, i, v' |h];)]) 


(hi v)e 
I'-I 


(CFUN) 


a = ret(h, i, n) 


(I, E[ret(h, i,n)]) “> (I, E[n]) 


(RINT) 


v is a function a = ret(h, i, [h],) 
T - (m, Av) 
(I, E[ret(h, i, v)]) ^ (I, E[h1,]) 


(RFUN) 


4) for £ 
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0 
tw ety, 


(11,main 0) h inc 4 Azar +1, 


. 0, ] 
main — Ar.tw inc r 


I> 


Ih [nain ^ — Ar.tw inc r} 


SU, (I5, Egan [(Ar.tw inc r) 0]) h 
—> (h, Enain[tw inc 0]) 
RS (Is, Eu Af (f 2) llo) [Za S ro {tw Hs ew, Ltwlo 1% inc} 
— (Is, Ek [Ax. [tw], (|tw]g x)]) 
eene ro (Ia, Éwaa[[tw]g 0]) LÊ h {[twly 9 et} 
Sal few] a00) (Is, Ete], (Ax. [tw], ([tw), z)) 0) Ws EI Ia { [tw], E e.) 
— (Is, Eri [[tw]o. (Ltw]o 0)]) 
SUI (Te; Biss. ine 0]) ÊI Is { [ts], > inc} 
call(inc,0,0) (Is, Eimc[(Ax.£ + 1) 0]) I 5T Io {inc + => Azx.x + 1} 
= (I7, Einc(1 ) 
ret (inc,0,1)-ret(|tw]g,0,1) h, Enmigllt¥]g 1) 
Eat] 11). (Is, Eis] ; [inc 1]) Ts 2 Tr { [tw] 5 inc} 
see, (Io, Fine (Az. + 1) 1]) Io 2 Is [inc X +1} 


—M (Jo, Eine [2]) 
ret(inc,1,2)-ret(|tw]9,1,2)-ret([tw],,0,2)-ret(main,0,2) 


(Io, 2) 
ew S Aff (f m) e. Ê As. [tw], (Ltw] z) 
Basin " ret(main, 0, [ ]) Eva 2 Emainlret(tw, 0, []) 0] 
Eftv] 0 Fi Emain[ret([tw] ,0, [ D] Etu] o 2 Efe] 0 [[&w |o ret(|tw |; ,0, [ pl 
Eme , Etw|, [ret (1nc, 0, [ ])] Ett, = Ep ret(|tu]o 1, [])] 


Eine E Ex [ret (inc, 1, [ M 


Fig. 2. Example trace of Di, 


We now introduce a trace semantics of the language £, which will be used 
in Sect. 5 to define our model checking problems of higher-order programs. In 
the trace semantics, a program execution trace is represented by a sequence of 
function call and return events without an explicit representation of pointers 
but with enough information to construct them. We will explain how to model 
traces of £ as HOTs by presenting a translation. 

The trace semantics [D] of the language £ is defined as [D],;, U [D], 


where [D],,, = E | (I,main n) 5 c} and [D]i,, = 41 | (I,main n) + 1} 
are respectively the sets of finite and infinite execution traces obtained by 
evaluating main n for some integer n using trace-labeled multi-step reduction 
relations 5 and 4, which are presented in Fig. 1, under the program I = 


{ fe v fve D} annotated with the number of calls to each function 
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occurred so far (i.e., initialized to 0). There, we use w (resp. x) as a meta-variable 
ranging over finite sequences 04 ::: c, (resp. infinite sequences o4 : a2---) of 
events a;. We write e for the empty sequence, c : w2 for the concatenation of 
the sequences c and w2, and |w] for the length of w. An event a is either of 
the form call(h,,i,h2) or ret(hi,i, h3), where a handle h represents a top-level 
function or a runtime value exchanged among functions. An event call(hi, i, h2) 
represents the (i + 1)" call to the function hi with the argument ha. On the 
other hand, an event ret(h4,i,h3) represents the return of the (i + 1)*^ call to 
the function hı with the return value h2. We thus equip call and return events 
of hı with the information about (1) the number i of the calls to hy occurred 
so far and (2) the runtime value ho passed to or returned by hi, so that we can 
construct pointers (see Definition3 for details). Note here that handles h are 
also equipped with meta-information necessary for constructing pointers. More 
specifically, h is any of the following: a bounded integer n, a top-level function 
name f € dom(D), the special identifier |A]; for the function argument of the 
(i 4- 1)** call to the higher-order function h, or the special identifier [h]; for the 
partially-applied function returned by the (i-4- 1)*^ call to h. We thus use handles 
to track for each function value where it is constructed and how many times it 
is called. We shall assume that the syntax of expressions e and values v is also 
extended with handles h. As we have seen, the finite traces [D],,, of a program 
D are collected using the terminating trace-labeled multi-step reduction relation 
Z on configurations. A configuration (I, E[e]) is a pair of an interface J and an 
expression E[e] consisting of an evaluation context E and a sub-expression e 
under evaluation. A special evaluation context ret(h, i, E) represents the calling 
context of the (i -- 1)'^ call to h that waits for the return value computed by E. 


An interface I is defined to be [hi au" Vessan m Um} that maps each func- 
tion handle hj to its definition vj, where 7; records the number of calls to the 
function hj occurred so far. In the derivation rules for 5, [op] represents the 
integer function denoted by op, and I [^ e v) represents the interface obtained 
from I by adding (or replacing existing assignment to h with) the assignment 


hsv. In the rule CINT (resp. RINT) for function calls (resp. returns) with an 
integer n, the reduction relation is labeled with call(h,i,n) (resp. ret(h, i, n)). 
By contrast, in the rule CFUN (resp. RFUN) for function calls (resp. returns) 
with a function value v, the special identifier |A]; (resp. [h];) for v is used in 
the label call(h, i, | ];) (resp. ret(h, i, [h];)) of the reduction relation, and v in 
the expression is replaced by the identifier. For example, as shown in Fig. 2, the 
following finite trace wy, is generated from the program Dy: 


call(main, 0, 0) - call(tw, 0, | tw|;) - ret(tw, 0, [tw],) - call([tw], , 0, 0) 
call(|tw|, , 0,0) - call(inc, 0,0) - ret(inc,0, 1) : ret(|tw|,, 0, 1)  call(| tw], , 1,1) 
call(inc, 1,1) - ret(inc, 1,2) -ret(|tw|,,1,2)-ret([tw], , 0, 2) - ret(main, 0, 2) 


Similarly, the infinite traces [D],,,; of a program D are collected using the non- 
terminating trace-labeled reduction relation C 4 1 on configurations. Intu- 
itively, C > | means that an execution from the configuration C diverges, 
producing an infinite event sequence v. In the rule TRANw, the double horizon- 
tal line represents that the rule is interpreted co-inductively. 
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We now define the translation from traces [D],,, to HOTs with ZZ = 
{call(f,n), call(f,e) | f € dom(D),n € Ze}, Lan = {call(e,n), call(e,e) | n € Zi), 
and Yer = {ret(f,n),ret(f,e),ret(e,n),ret(e,e) | f € dom(D),n € Za}. We 
shall write X(D) for a aj U Ea U Set. Note that X(D) is finite because 
dom(D) and Z, are finite. We write |a| for the element of X(D) obtained from 
the event o by dropping the second argument and replacing |h]; and [h]; by e. 
For example, we get |call(tw,0, | tw]5)| = call(tw, e). 


Definition 3 (Finite Traces to HOTs). Given a finite trace w = a1 -Qm € 
[Plein with m > 0, the corresponding HOT Go = (Va, Aw, Væ) is defined by: 


- Vo = {1,...,m}, 

- Aw = {j > laj| | j € Va}, and 

— Væ is the smallest relation that satisfies: for any j1, j2 € Va, 

ji <n j2 fj2=h +1, 

ji ~or j2 if 3h, h', hd. aj, = call(h, i, h’) ^ aj, = ret(h,i,h”), 

ji «coc j2 if Ih, Wh" d, i. aj, = call(h', i, h) ^o, = call(h, i, h”), 
ji ~re j2 if Ih, V, hd, i'. oj, = ret(h/, i, h) ^ aj, = call(h, i’, h”). 


For example, the HOT Gi, in Sect. 1 is translated from the finite trace wry 
defined above (with the call and return events of main omitted). 

For an infinite trace m = o1: 05::: € [D];,,, the HOT Gy = (Vz, Ax, Vz) is 
defined similarly for V; = {j c N | j > 1} and Ax = {j © |a,| | j € Va}. 


3 Propositional Dynamic Logic over Higher-Order Traces 


This section presents HOT-PDL, a propositional dynamic logic (PDL) defined 
over HOTs (see [16] for a general exposition of PDL). HOT-PDL extends path 
expressions of PDL with —>ret and — cay for traversing edges of HOTs labeled 
respectively with CR and CC/RC. The syntax is defined by: 


(formulas) ¢ ::= p | 1 A d2 | 74 | [n] ¢ 
(path expressions) 7 ::= — | can | ret | (90)? | ma ^ 02 | v1 + T2 | v* 


Here, p is a meta-variable ranging over atomic propositions AP. Let T and L 
denote tautology and contradiction, respectively. Path expressions 7 are defined 
using a syntax based on regular expressions: we have concatenation 7 - T2, 
alternation 7, +7, and Kleene star 7*. We write + for 7-7*. Path expressions 
—, ret, and — cau are for traversing edges labeled with N, CR, and CC or 
RC, respectively. A path expression {¢}? is for testing if ó holds at the current 
node. A formula [r] means that ¢ always holds if one moves along any path 
represented by the path expression m. The dual formula (7) @ is defined by 
—^[r| ^é and means that there is a path represented by m such that ¢ holds if 
one moves along the path. (7) and [r] have the same priority as 7. 

We now define the semantics of HOT-PDL. For a given HOT G = (V, A,v) 
with X = AP, A(u) represents the atomic proposition satisfied at the node 
u € V. We define the semantics [ġ] of a formula ¢ as the set of all nodes u € V 
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E dS c R 
DA. 
Ta FEN. 


1: call(main, 0), 2 : call(tw, e), 3 : ret(twu, e), 4 : call(e,0),5 : call(e, 0), 
6 : call(inc,0), 7 : ret(inc,1),8 : ret(e, 1), 9 : call(e, 1), 10 : call(inc, 1), 
11: ret(inc, 2), 12 : ret(e, 2), 13 : ret(e, 2), 14 : ret(main, 2) 


Fig. 3. The pairs of nodes in Gi, related by CR or 7r 


where ó is satisfied, and the semantics [7], of a path expression 7 as the set of 
all pairs (u1, u2) € V x V such that one can move along 7 from wu, to us. 


ple = {u E€ V |p= A(u)} [ó1^492 e7 [bile N TAR Fele =V \ lele 
[1] d]g = {u € V | Vw'. ((u, u’) € Ir]e > v' € [4] g)} 
—]e = <N [>ret]g = «cn [—canlg = 4cc U «nc 


Holle = ((w u) € V xV |u E lela} 
[m : 72]o = {(u1, u3) € V x V | dug € V. (u1, U2) € Im] GA (u2, ua) € [ta] co} 


71 + mle = [mila Y [rle [*]a = Um>o In] 


Here, for a binary relation R, R™ denotes the m-th power of R. Note that this 
semantics can interpret a given HOT-PDL formula over both finite and infinite 
HOT s. [p]a consists of all nodes labeled by p. [[7] ó] c contains all nodes from 
which we always reach to a node in [6] 5 if we take a path represented by r. [—] a, 
[^re] c; and [caule contain the pairs of nodes linked by an edge labeled by N, 
CR, and CC or RC, respectively. We write G E- 9$ if 0c € [6] o. For example, let 
us consider the HOT Gy, and AP = (Diy). Then, [(—) ret(tw, e)] g, consists 
of the node labeled by call(tw, e). [(—ret) ret(e; 2)]G,. consists of a node labeled 
by call(e,0) and the node labeled by call(e, 1). [(—cau) call(e,0)]G,. consists 
of the two nodes respectively labeled by call(tw, e) and ret(tw, e). The example 
properties of D,, discussed in Sect. 1 can be expressed as follows: 


Prop.1.: [^*] Asez,, ((call(tw, e) ^ (ret * can) call(e, z)) > (can) call(e, z)) 
Prop.2.: [*] ((eall(tw, e) ^ ^ (yet * cau) T) > ^ (can) T) 


Here, Azez, $ abbreviates [nmin/2] 6 A+++ A [Mmax/x] $ 
In Sect.4, we show further examples that express interesting properties of 
higher-order programs, including stack based access control properties and those 
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1: call(main, 0), 2: call(tw,e),3: ret(tw,e),4: call(e,0),5: call(e, 0), 
6 : call(inc,0), 7 : ret(inc,1),8 : ret(e, 1), 9 : call(e, 1), 10 : call(inc, 1), 
11: ret(inc, 2), 12 : ret(e, 2), 13 : ret(e, 2), 14 : ret(main, 2) 


Fig. 4. The pairs of nodes in Gi, related by CR, CC, RC, or ZH 


definable using dependent refinement types. We here prepare notations used 
there. First, we overload the symbols “can, ret, and x i to denote the path 
expressions (V Xean}?, (V ret }?, and (V EL }?, respectively. We write +p 
for the path expression —>rett —, which is used to move from a call event to 
the next event of the caller (by skipping to the next event of the corresponding 
return event). We also write "p for the path expression ean: > : >} ‘call; 
which is used to move from a call event to any call event invoked by the callee. 
Figure 3 illustrates the pairs of nodes in Gy, related by 7. To capture control 
flow of higher-order programs, where function callers and callees may exchange 
functions as values, we need to use CC- and RC-labeled edges. For example, an 
event raised by the function argument farg of a higher-order function f could 
be regarded as an event of the caller g of f, because farg is constructed by 
g. Similarly, an event raised by the (partially-applied) function fret returned 
by a function f could be regarded as an event of f. To formalize the idea, we 
introduce variants >p and / y of — p and / r with higher-order control flow 
taken into consideration: >p denotes (yet: >) + (can: —) and “rp denotes 
XDa cc MLg Note that the source and the target of 7 are restricted 
to call events of top-level functions. Figure 4 illustrates the pairs of nodes in Gi, 
related by 7g, where nodes labeled with events of the same function (in the 
sense discussed above) are arranged in the same horizontal line. 


4 Applications of HOT-PDL 


We show how to encode dependent refinement types and stack-based access 
control properties using HOT-PDL. 
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4.1 Dependent Refinement Types 


HOT-PDL can specify pre- and post-conditions of higher-order functions, by 
encoding dependent refinement types 7 for partial [29,33,40] and total [23, 27, 34, 
36, 39] correctness verification, defined as: 7::= (v | Y} | (£ : 71) — 2. Here, Q is 
either V or J. An integer refinement type (v | v) is the type of bounded integers 
v that satisfy the refinement formula wv over bounded integers. A dependent 
function type (a : 71) — TY is the type of functions that, for any argument x 
conforming to the type 71, if terminating, return a value conforming to the type 
Tə. By contrast, (£ : 71) — T3 is the type of functions that, for any argument 
x conforming to Tı, always terminate and return a value conforming to 72. For 
example, Prop.3 and Prop.4 of D«, are expressed by the following types of tw: 


v 
Prop.3.: (f : (x : int) > {v | v > z)) 5 (e : int) > (v|v > ap) 
Prop.4.: (f : (x: int) > int?) > ((x : int) > int3)” 
We here write int for {v | T}. These types can be encoded in HOT-PDL as: 


Prop.3.: call(tw,e) = ([cau] incr(e)) A [ret] (ret(tw, e) => [eau] incr(e)) 
Prop.4.: call(tw, e) => ([cau] term(e)) ^ [ret] (ret(tu, e) => [ca] term(e)) 
Here, incr(g) = Azez, call(g, £) = [ret] Ayez,(ret(g,y) — y > x) and 


term(g) = Axez, (call(g, £) = (ret) T) for g € (ejU (f | f € dom(D)}. We 
now define a translation F from types to HOT-PDL formulas as follows: 


F(g,(r:m) TS) = A (call(, 2) => Fag(2,T1) ^ Frei(9,72)) 


xe|r1| 
Iz: m1) > 73"| = (e |i | p}|=Z, 
v-"— mueiuns dT 0 F inaw) 
Farg( , ) [ call] F( ’ ) Farg( S: lY} H (if | [n/z]v) 


Fral, T”) = [^re]. A (ret(g,2) > F(2,7)) 
ze€|T| 


Fya(g, T^) = ((ret) T) ^ Frag, T") 


4.2 Stack-Based Access Control Properties 


As briefly summarized in Sect. 1, stack-based access control [13] ensures that a 
security-critical function (e.g., file access) is invoked only if all the (immediate 
and indirect) callers in the current call stack are trusted, or one of the callers 
is a privileged function and its callees are all trusted. We here use HOT-PDL 
to specify stack-based access control properties for higher-order programs. Let 
Critical, Trusted, and Priv be HOT-PDL formulas that tell whether the cur- 
rent node is labeled with a call event of security-critical, trusted, and privileged 
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functions, respectively. We assume that Critical, Priv, and “Trusted do not 
overlap each other, and a function in Priv can be directly called only from a 
function in Trusted. Then, one may think we can express the specification as: 


S C75 {Trusted}? - (7p - (2Priv)?)*) Critical 


Here, the path expression / p introduced in Sect. 3 is used to traverse the call 
stack bottom-up. The above formula says that an invalid call stack never occurs, 
where a call stack is called invalid if it contains a call to an untrusted function 
(represented by the part 77. (—^Trusted ?), followed by a call to a critical func- 
tion (represented by Critical), with no intervening call to a privileged function 
(represented by (7r - (Priv) ?)*). 

This definition, however, is not sufficient for our higher-order language. Let 
us consider the following program Dpa, which involves a partial application: 


let untrusted () = Au.critical u 


let main () = untrusted () () 


Here, untrusted ¢ Trusted and critical € Critical. Intuitively, Dpa should 
be regarded as unsafe because critical in the body of untrusted is called. 
However, D,, satisfies the specification above (under the assumption that anony- 
mous functions are in Trusted), because the partial application untrusted () 
never causes a call to critical but just returns the anonymous (and trusted) 
function Au.critical u. The following higher-order program Dho is yet another 
unsafe example that satisfies the specification: 


let privileged f — f () 
let trusted f = if test () then privileged f else () 
let untrusted () = trusted (Az.crash (); critical ()) 


let main () = untrusted () 


Here, privileged € Priv, trusted c Trusted, untrusted ¢ Trusted, and 
critical € Critical. Note that critical in the body of untrusted is called 
as follows: the anonymous function Ax.crash (); critical () is first passed to 
trusted and then to privileged (if test () returns true), and is finally called 
by privileged, causing a call to critical. 

To remedy the limitation, we introduce a new refined variant of stack-based 
access control properties for higher-order programs, formalized in HOT-PDL 
from the point of view of interactions among callers and callees as follows: 


SC : (Trusted)? - (Zy - {-Priv}?)*) Critical 


Note that this is obtained from the previous version by just replacing / p with 
/ H, Which takes into account which function constructed each function value 
exchanged among functions. The refined version rejects the unsafe Dpa and Dy, 
as intended: Dpa (resp. Dho) is rejected because the call event of Au.critical u 
(resp. Ax.crash (); critical ()) is regarded as an event of untrusted. 


Propositional Dynamic Logic for Higher-Order Functional Programs 117 


Fournet and Gordon [13] have studied variants of stack-based access control 
properties for a call-by-value higher-order language. We conclude this section by 
comparing ours with one of theirs called “stack inspection with frame capture” .? 


The ideas behind the two are similar but what follows illustrates the difference: 


let untrusted f = crash (); f () 
let trusted x = untrusted (Ax.if test () then critical () else ()) 
let main () — trusted () 


'This program satisfies ours but violates theirs. Note that ours allows a function 
originally constructed by a trusted function to invoke a critical function even 
if the function is passed around by an untrusted function. By contrast, in their 
definition, a trusted function value gets *contaminated" (i.e., disabled to invoke 
a critical function) once it is passed to or returned by an untrusted function. 
In some cases, their conservative policy is useful, but we believe ours would be 
more semantically robust (e.g., even works well with the CPS transformation). 


5 HOT-PDL Model Checking 


In this section, we define HOT-PDL model checking problems for higher-order 
functional programs over bounded integers and sketch a proof of the decidability. 


Definition 4 (HOT-PDL model checking). Given a program D and a HOT- 
PDL formula ¢ with AP = X(D), HOT-PDL model checking is the problem of 
deciding whether Gw = ó and Gx = ¢ for all w € [D],,, and x € [D],,,. 


Theorem 1 (Decidability). HOT-PDL model checking is decidable. 


We show this by a reduction to modal p-calculus (u-ML) model checking 
of higher-order recursion schemes (HORSs), which is known decidable [21,28]. 
A HORS is a grammar for generating a (possibly infinite) ranked tree, and 
HORSs are essentially simply-typed lambda calculus with general recursion, tree 
constructors, and finite data domains such as booleans and bounded integers. 

In the reduction, we encode the set of HOTs that are generated from the given 
program D as a single tree (generated by a HORS). For example, Fig.5 shows 
such a tree that encodes the HOTs of D,,.? There, a node labeled with end rep- 
resents the termination of the program. Note that the branching at the root node 
is due to the input to the function main. The subtree with the root node labeled 
with call(main,0) is obtained from the HOT Gw by appending a special node 
labeled with end, adding, for each edge with the label » € (N, CR, CC, RC}, 
a new node labeled with y, and expanding the resulting DAG into a tree. Thus, 
the edge labels of G,, are turned into node labels of the tree. 


? We do not compare with the other variants in [13] because they are too syntactic to 
be preserved by simple program transformations like inlining. 

3 There, for simplicity, we illustrate an unranked tree and omit the label of branching 
nodes. In the formalization, we express an unranked tree as a binary tree using a 
special node label br of the arity 2 representing a binary branching. 
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It is also worth 
mentioning here that 
we are allowed to expand 
DAGs into trees because 
the truth value of a eai) 

HOT-PDL formula is 

not affected by node- P 

sharing in the given N CR CC CC end 
HOT. This nice prop- 
erty is lost if we extend 
the path expressions of LN. 
HOT-PDL, for exam- N RC end 
ple, with intersections. : | 

Thus, the decidabil- * call(., 0) 
ity of model checking : 

for extensions of HOT- 
PDL is an open prob- 
lem. 

We next explain our translation from a HOT-PDL formula into a -ML 
formula interpreted over trees that encode HOTs. Our translation is based on 
an existing one for ordinary PDL [11]. The syntax of u-ML is defined as follows: 


call(main, Nmin) .... call main, 0) "m call(main, Timax) 


| | | | 
ret(tw, -) ret(tw,-) call(-, 0) call(-, 1) 


Fig. 5. A tree encoding the HOTs generated from Diy 


pr=X|p|-y| png |Op|vx.e| px 


Here, X represents a propositional variable and p represents an atomic propo- 
sition. A formula Dy means that y holds for any child of the current node. A 
formula uX. (resp. X.) represents the least (resp. greatest) fixpoint of the 
function AX.y. Here, we assume X occurs only positively in y. For example, the 
HOT-PDL formulas [>] p, [ret] p, and [cau] p are respectively translated to 
L- ML formulas: O(vX.(N => Op) ^ (br 2 OX)), O(vxX.(CR => Op) ^ (br > 
X)) , and O(vx.((CC v RC) = Dp) ^ (br > DX)), where the greatest fix- 
points are used to skip the branching nodes labeled with br (that may repeat 
infinitely). 

Finally, we explain how to obtain a HORS for generating a tree that encodes 
the set of HOTs generated from the given program D. We here need to simulate 
pointer traversals of HOT-PDL by using purely functional features of HORSs 
because u-ML does not support pointers. Intuitively, we obtain the desired 
HORS from D by embedding an event monitor and an event handler. Whenever 
the monitor detects a function call or return event during the execution of D, 
the handler creates a new node labeled with the event or ignores the event until 
a certain event is detected by the monitor, depending on the current mode of 
the handler. The handler has the following three modes: 


mw: The handler always creates and links two new nodes uw and ua labeled 
respectively with N and the event a observed. The handler then continues as 
follows, depending on the form of the event o: 
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call(g, n): Spawns a new handler with the mode Mret. Then, the two handlers 
of the modes my and Mret continue to create subtrees of ua. 

call(g, e): Spawns two new handlers with the modes Mret and Mean. The 
three handlers of mw, Mret, and Mca continue to create subtrees of ua. 

ret(g,n): The handler of the mode mw continues to create a subtree of ua. 

ret(g, e): Spawns a new handler with the mode mean. Then, the two handlers 
of the modes my and Meal continue to create subtrees of ua. 

Mret: The handler ignores all events but the return event corresponding to the 
call event that caused the spawn of the handler. If not ignored, the handler 
creates and links new nodes ucg and ua labeled with CR and the event a. 
The handler changes its mode to mw and continues creating a subtree of ua. 

Mea: The handler ignores all events but the call event of the function passed to 
or returned by the call or return event that caused the spawn of the handler. 
If not ignored, the handler creates and links new nodes u and u, labeled 
respectively with CC or RC and the event a, duplicates itself, and changes 
the mode of the original to my. The handler of the mode mw (resp. Meall) 
continues to create a subtree of ua (resp. the parent of u). 


For simplicity of the construction, we assume that D is in the Continuation- 
Passing Style (CPS). This does not lose generality because we can enforce this 
form by the CPS transformation. Because CPS explicates the order of function 
call and return events, it simplifies event monitoring, handling, and tracking of 
the current mode of the monitors, which often changes as monitoring proceeds. 


6 Related Work 


HOT-PDL can specify temporal trace properties of higher-order programs. An 
extension for specifying branching properties, however, remains a future work. 
'There have been proposed logics and formal languages on richer structures 
than words. Regular languages of nested words, or equivalently, Visibly Push- 
down Languages (VPLs) have been introduced by Alur and Madhusudan [7]. An 
(w-)nested word is a (possibly infinite) word with additional well-nested point- 
ers from call events to the corresponding return events. Compared to temporal 
logics CaRet [5] and NWTL [4] over (w-)nested words, HOT-PDL is defined 
over HOTs that have richer structures. Recall that a HOT is equipped with 
two kinds of pointers: one kind with the label CR, which is the same as the 
pointers of nested words, and the other kind with the label CC or RC, which 
is newly introduced to capture higher-order control flow. Bollig et al. proposed 
nested traces as a generalization of nested words for modeling traces of concur- 
rent (first-order) recursive programs, and presented temporal logics over nested 
traces [8]. Nested traces, however, cannot model traces of higher-order programs. 
We expect a combination of our work with theirs enables us to specify temporal 
trace properties of concurrent and higher-order recursive programs. Cyriac et 
al. have recently introduced an extension of PDL defined over traces of order- 
2 collapsible pushdown systems (CPDS) [3]. Interestingly, their traces are also 
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equipped with two kinds of pointers: one kind of pointers captures the correspon- 
dence between ordinary push and pop stack operations, and the other captures 
the correspondence between order-2 push and pop operations for second-order 
stacks. Our work deals with higher-order programs that correspond to order-n 
CPDS for arbitrary n. 

Finally, we compare HOT-PDL with existing logics defined over words. It 
is well known that LTL is less expressive than w-regular languages [38]. To 
remedy the limitation of LTL, Wolper introduced ETL [38] that allows users 
to define new temporal operators using right-linear grammars. Henriksen and 
Thiagarajan proposed DLTL [17] that generalizes the until operator of LTL using 
regular expressions. Leucker and Sanchez proposed RLTL [25] that combines LTL 
and regular expressions. Vardi and Giacomo have introduced Linear Dynamic 
Logic (LDL), a variant of PDL interpreted over infinite words [15,35]. LDL, 
a variant of PDL interpreted over finite words, has also been studied in [15]. 
ETL, DLTL, RLTL, and LDL are as expressive as w-regular languages. Note 
that HOT-PDL subsumes (w-)regular languages because LDL and LDL, can 
be naturally embedded in HOT-PDL. (w-)VPLs strictly subsume (w-)regular 
languages. Though CaRet [5] and NWTL [4] are defined over nested words, they 
do not capture the full class of VPLs [10]. To remedy the limitation, VLTL [10] 
combines LTL and VRE [9] in the style of RLTL, where VRE is a generalization 
of regular expressions for VPLs. VLDL [37] extends LDL by replacing the path 
expressions with VPLs over finite words. VLTL and VLDL exactly characterize 
w-VPLs. Because VPLs and HOT-PDL are incomparable, it remains future work 
to extend HOT-PDL to subsume (w-)VPLs. 


7 Conclusion and Future Work 


We have presented HOT-PDL, an extension of PDL defined over HOTs that 
model execution traces of call-by-value and higher-order programs. HOT-PDL 
enables a precise specification of temporal trace properties of higher-order pro- 
grams and consequently provides a foundation for specification in various appli- 
cation domains including stack-based access control and dependent refinement 
types. We have also studied HOT-PDL model checking and presented a reduction 
method to modal -calculus model checking of higher-order recursion schemes. 

To further widen the scope of our approach, it is worth investigating how 
to adapt HOTs and HOT-PDL to call-by-name and/or effectful languages. To 
this end, it is natural to incorporate more ideas from achievements of game 
semantics [1,20,32] and extend HOTs with new kinds of events and pointers for 
capturing call-by-name and/or effectful computations. 
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Abstract. We present new algorithms for proving program termina- 
tion and non-termination using syntax-guided synthesis. They exploit 
the symbolic encoding of programs and automatically construct a for- 
mal grammar for symbolic constraints that are used to synthesize either 
a termination argument or a non-terminating program refinement. The 
constraints are then added back to the program encoding, and an off- 
the-shelf constraint solver decides on their fitness and on the progress 
of the algorithms. The evaluation of our implementation, called FREQ- 
TERM, shows that although the formal grammar is limited to the syntax 
of the program, in the majority of cases our algorithms are effective 
and fast. Importantly, FREQTERM is competitive with state-of-the-art 
on a wide range of terminating and non-terminating benchmarks, and 
it significantly outperforms state-of-the-art on proving non-termination 
of a class of programs arising from large-scale Event-Condition-Action 
systems. 


1 Introduction 


Originated from the field of program synthesis, an approach of syntax-guided 
synthesis (SyGuS) [2] has recently been applied [14,16] to verification of pro- 
gram safety. In general, a SyGuS-based method walks through a set of candi- 
dates, restricted by a formal grammar, and searches for a candidate that meets 
the predetermined specification. The distinguishing insight of [14,16], in which 
SyGuS discovers inductive invariants, is that a formal grammar need not nec- 
essarily be provided by the user (as in applications to program synthesis), but 
instead it could be automatically constructed on the fly from the symbolic encod- 
ing of the program being analyzed. Despite being incomplete, the approach shows 
remarkable practical success due to its ability to discover various facts about pro- 
gram behaviors whose syntactic representations are compact and look similar to 
the actual program statements. 

Problems of proving and disproving program termination have a known con- 
nection to safety verification, e.g., [7,19,28,39, 40]. In particular, to prove termi- 
nation, a program could be augmented by a counter (or a set of counters) that is 
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initially assigned a reasonably large value and monotonically decreases at each 
iteration [38]. It remains to solve a safety verification task: to prove that the 
counter never goes negative. On the other hand, to prove that a program has 
only infinite traces, one could prove that the negation of a loop guard is never 
reachable, which boils down to another safety verification task. This knowledge 
motivates us not only to exploit safety verification as a subroutine in our tech- 
niques, but also to adapt successful methods across application domains. 

We present a set of SyGuS-based algorithms for proving and disproving ter- 
mination. For the former, our algorithm LINRANK adds a decrementing counter 
to a loop, iteratively guesses lower bounds on its initial value (using the syntactic 
patterns obtained from the code), which lead to the safety verification tasks to be 
solved by an off-the-shelf Horn solver. Existence of an inductive invariant guar- 
antees termination, and the algorithm converges. Otherwise LINRANK proceeds 
to strengthening the lower bounds by adding another guess. Similarly, our algo- 
rithm LEXRANK deals with a system of extra counters ordered lexicographically 
and thus enables termination analysis for a wider class of programs. 

For proving non-termination, we present a novel algorithm NONTERMREF 
that iteratively searches for a restriction on the loop guard, that might lead to 
infinite traces. Since safety verification cannot in general answer such queries, we 
build NONTERMREF on top of a solver for the validity of V3-formulas. In partic- 
ular, we prove that if at the beginning of any iteration the desired restriction is 
fulfilled, then there exists a sequence of states from the beginning to the end of 
that iteration, and the desired restriction is fulfilled at the end of that iteration 
as well. Recent symbolic techniques [15] to handle quantifier alternation enabled 
us to prove non-termination of a large class of programs for which a reduction 
to safety verification is not effective. 

These three algorithms are independent of each other, but they all rely on 
a generator of constraints that are further applied in different contexts. This 
distinguishes our work from most of the related approaches [7, 18, 20,23, 30, 32, 
36,39,40]. The key insight, adapted from [14,16], is that the syntactical struc- 
tures that appear in the program give rise to a formal grammar, from which 
many candidates could be sampled. Because the grammar is composed from a 
finite number of numeric constants, operators, and variable combinations, the 
number of sampled constraints is always finite. Furthermore, since our samples 
are syntactically close to the actual constructs which appear in the code, they 
often provide a practical guidance towards the proof of the task. Thus in the 
majority of cases, the algorithms converge with the successful result. 

We have implemented our algorithms in a tool called FREQTERM, which 
utilizes solvers for Satisfiability Modulo Theory (SMT) [11,15] and satisfiability 
of constrained Horn clauses [16,24,26]. These automatic provers become more 
robust and powerful every day, which affects performance of FREQTERM only 
positively. We have evaluated FREQTERM on a range of terminating and non- 
terminating programs taken from SVCOMP! and on large-scale benchmarks 


1 Software Verification Competition, http://sv-comp.sosy-lab.org/. 
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arising from Event-Condition-Action systems? (ECA). Compared to state-of- 
the-art termination analyzers [18, 22,30], FREQTERM exhibits a competitive run- 
time, and achieves several orders of magnitude performance improvement while 
proving non-termination of ECAs. 

In the rest of the paper, we give background on automated verification 
(Sect.2) and on SyGuS (Sect.3); then we describe the application of SyGuS 
for proving termination (Sect.4) and non-termination (Sect.5). Finally, after 
reporting experimental results (Sect.6), we overview related work (Sect.7) and 
conclude the paper (Sect. 8). 


2 Background and Notation 


In this work, we formulate tasks arising in automated program analysis by encod- 
ing them to instances of the SMT problem [12]: for a given first-order formula y 
and a background theory to decide whether there is an assignment m of values 
from the theory to variables in y that makes ọ true (denoted m E- y). If every 
assignment to ọ is also an assignment to some formula y, we write p = > vw. 


Definition 1. A transition system P is a tuple (V U V', Init, Tr), where V is 
a vector of variables; V' is its primed copy; formulas Init and Tr encode the 
initial states and the transition relation respectively. 


We view programs as transition systems and throughout the paper use both 
terms interchangeably. An assignment s of values to all variables in V (or any 
copy of V such as V^) is called a state. A trace is a (possibly infinite) sequence 
of states s, s',..., such that (1) s E- Init, and (2) for each i, s, s6* E- Tr. 

We assume, without loss of generality, that the transition-relation formula 
Tr(V, V^) is in Conjunctive Normal Form, and we split Tr(V, V^) to a con- 
junction Guard(V) ^ Body(V, V’), where Guard(V) is the maximal subset of 
conjuncts of Tr expressed over variables just from V, and every conjunct of 
Body(V , V^) can have appearances of variables from V and V”. 

Intuitively, formula Guard(V) encodes a loop guard of the program, whose 
loop body is encoded in Body(V, V^). For example, for a program shown in 
Fig. la, V = (x, y, K}, the Guard = y < K V y > K, and the entire encoding of 
the transition relation is shown in Fig. 1b. 


Definition 2. If each program trace contains a state s, such that s = Guard, 
then the program is called terminating (otherwise, it is called non-terminating). 


Tasks of proving termination and non-termination are often reduced to tasks 
of proving program safety. A safety verification task is a pair (P, Err), where 
P = (VU V', Init, Tr) is a program, and Err is an encoding of the error states. 
It has a solution if there exists a formula, called a safe inductive invariant, that 
implies Init, is closed under Tr, and is inconsistent with Err. 


? Provided at http:/ /rers-challenge.org/2012 /index.php?page—problems. 
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while (y != K) { Tr(x,z',y, y, K, K') = 
xe e EE DRS EG (y<KVy>K)|AK'=KA 
(x < K) ?x+1: x; 7 
z —ite (x > K,z—l,ite (x< K,x+1,2))A 
dob Rc NR ‘=ite (y >x lite (y < x',y +1,y)) 
= ite x,y—1,ite x, ; 
— ieee E y y y ytly 
(a) (b) 
CONST ::= 0 
l-y+(-1)-kK>0 COEF :—1| -1 
eyt IK n VAR =a | y| K 
l1-z+(—1):-K>0 SUM ::= COEF - VAR + COEF - VAR + CONST 
(-1)-x+1:-K>0 INEQ ::= SUM > 0 
(c) (d) 


Fig. 1. (a): C-code; (b): transition relation Tr (in the framebox — Guard); (c): formulas 
S extracted from Tr and normalized; (d): grammar that generalizes S. 


Definition 3. Let P = (V U V', Init, Tr); a formula Inv is a safe inductive 
invariant if the following conditions hold: (1) Init( V) = > Inv(V), (2) Inv(V)^ 
Tr(V, V^) => Inv(V^), and (3) In((V) ^ Err(V) — L. 


If there exists a trace c (called a countererample) that contains a state s, 
such that s = Err, then the safety verification task does not have a solution. 


3 Exploiting Program Syntax 


'The key driver of our termination and non-termination provers is a generator 
of constraints which help to analyze the given program in different ways. The 
source code often gives useful information, e.g., of occurrences of variables, con- 
stants, arithmetic and comparison operators, that could bootstrap the formula 
generator. We rely on the SyGuS-based algorithm [16] introduced for verifying 
program safety. It automatically constructs the grammar G based on the fixed 
set of formulas S obtained by traversing parse trees of Init, Tr, and Err. In our 
case, Err is not given, so G is based only on Init and Tr. 

For simplicity, we require formulas in S to have the form of inequalities 
composed from a linear combination over either V or V' and a constant (e.g., 
z’ < y +1 is included, but z' = z-F1 is excluded). Then, if needed, variables are 
deprimed (e.g., z' < y’+1 is replaced by z < y-- 1), and formulas are normalized, 
such that all terms are moved to the left side (e.g., 2 < y+ 1 is replaced by 
x—y—1<0), the subtraction is rewritten as addition, < is rewritten as >, and 
respectively < as > (e.g., x — y — 1 < 0 is replaced by (—1) -x 4- y 4-1 » 0). 

The entire process of creation of G is exemplified in Fig. 1. Production rules of 
G are constructed as follows: (1) the production rule for normalized inequalities 
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(a) 
D i»z-K|^i»[K-z|^i» y — K|^ 


i» K-—-y|^i»[x—y|^i»|y-z|—5Inv(z,y,i K) 


Q Inv(z,yiK)^(yc«KVy»K)^K' 2K^i =i-1A 
z'—ite (r» K,r—l,ite (x < K,r+1,2))A 


y —ite (y»z',y— Lite (y « zy t 1,y)) = Inv(a' y, i, K’) 
© Inv(z,y,i, K) ^A (y< KVy>K)^Ai<0= L 
(b) 


Fig. 2. (a): The worst-case dynamics of program from Fig. la; (b): the termination- 
argument validity check (in the frameboxes — lower bounds (/;) for i). 


(denoted INEQ) consists of choices corresponding to distinct types of inequalities 
in S, (2) the production rule for linear combinations (denoted SUM) consists 
of choices corresponding to distinct arities of inequalities in S, (3) production 
rules for variables, coefficients, and constants (denoted respectively VAR, COEF, 
and CONST) consist of choices corresponding respectively to distinct variables, 
coefficients, and constants that occur in inequalities in S. Note that the method 
of creation of G naturally extends to considering disjunctions and nonlinear 
arithmetic [16]. 

Choices in production rules of grammar G can be further assigned proba- 
bilities based on frequencies of certain syntactic features (e.g., frequencies of 
particular constants or combinations of variables) that belong to the program's 
symbolic encoding. In the interest of saving space, we do not discuss it here and 
refer the reader to [16]. The generation of formulas from G is performed recur- 
sively by sampling from probability distributions assigned to rules. Note that the 
choice of distributions affects only the order in which formulas are sampled and 
does not affect which formulas can or cannot be sampled in principle (because 
the grammar is fixed). Thus, without loss of generality, it is sound to assume 
that all distributions are uniform. In the context of termination analysis, we are 
interested in formulas produced by rules INEQ and SUM. 


4 Proving Termination 


We start this section with a motivating example and then proceed to presenting 
the general-purpose algorithms for proving program termination. 


Example 1. 'The program shown in Fig. 1a terminates. It operates on three inte- 
ger variables, x, y, and K: in each iteration y gets closer to x, and x gets closer 
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Algorithm 1. LINRANK(P): proving termination with linear termination 
argument 


Input: P= (V U V', Init, Tr) where Tr = Guard ^ Body 
Output: res € (TERMINATES, UNKNOWN) 
V—Vu(i VA VU ti}; 
Tr — Tr ^i —-i—1; Err Guard ^i < 0; 
G — GETGRAMMARANDDISTRIBUTIONS (Init, Tr); 
while CANSAMPLE(G) do 

cand — SAMPLE(G, SUM); 

G — ADJUST(G, cand); 

if Init = > i > cand then continue; 

Init — Init ^ i > cand; 

if ISSAFE(Init, Tr, Err) then return TERMINATES; 
return UNKNOWN; 


OMAN AA PWN RP 


m 
o 


to K. Thus, the total number of values taken by y before it equals K is no 
bigger than the maximal distance among z, y, and K (in the following, denoted 
Maz). 'The worst-case dynamics happens when initially x « y « K (shown in 
Fig. 2a), in other cases the program terminates even faster. To formally prove 
this, the program could be augmented by a so-called termination argument. For 
this example, it is simply a fresh variable i which is initially assigned Maz (or 
any other value greater than Max) and which gets decremented by one in each 
iteration. The goal now is to prove that 7 never gets negative. Fig. 2b shows the 
encoding of this safety verification task (recall Definition 3). The existence of a 
solution to this task guarantees the safety of the augmented program, and thus, 
the termination of the original program. Most state-of-the-art Horn solvers are 
able to find a solution immediately. 


'The main challenge in preparing the termination-argument validity check is 
the generation of lower bounds {4;} for i in Init (e.g., conjunctions of the form 
i£; in Q in Fig. 2b). We build on the insight that each 4; could be constructed 
independently from the others, and then an inequality i >£; could be conjoined 
with Init, thus giving rise to a new safety verification task. For a generation 
of candidate inequalities, we utilize the algorithm from Sect.3: all (£;) can be 
sampled from grammar G which is obtained in advance from Init and Tr. 

For example, all six formulas in (D in Fig. 2b: z — K, K —z,y— K, K — yc 
y, and y — x belong to the grammar shown in Fig.1d. Note that for proving 
termination it is not necessary to have the most precise lower bounds. Intuitively, 
the larger the initial value of 7, the more iterations it will stay positive. Thus, it 
is sound to try formulas which are not even related to actual lower bounds at 
all and keep them conjoined with Init. 
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4.1 Synthesizing Linear Termination Arguments 


Algorithm 1 shows an “enumerate-and-try” procedure to search for a linear ter- 
mination argument that proves termination of a program P. To initialize this 
search, the algorithm introduces an extra counter variable 7 and adds it to V 
(respectively, its primed copy i’ gets added to V^) (line 1).? Then the transition- 
relation formula Tr gets augmented by i’ = i—1, the decrement of the counter in 
the loop body. To specify a set of error states, Algorithm 1 introduces a formula 
Err (line 2): whenever the loop guard is satisfied and the value of counter i is 
negative. Algorithm 1 then starts searching for large enough lower bounds for i 
(i.e., a set of constraints over V U (i) to be added to Init), such that no error 
state is ever reachable. 

Before the main loop of our synthesis procedure starts, various formulas are 
extracted from the symbolic encoding of P and generalized to a formal grammar 
(line 3). The grammar is used for an iterative probabilistic sampling of candidate 
formulas (line 5) that are further added to the validity check of the current 
termination argument (line 8). In particular, each new constraint over i has the 
form i> cand, where cand is produced by the SUM production rule described in 
Sect.3. Once Init is strengthened by this constraint, a new safety verification 
condition is compiled and checked (line 9) by an off-the-shelf Horn solver. 

As a result of each safety check, either a formula satisfying Definition 3 or a 
counterexample cex witnessing reachability of an error state is generated. Exis- 
tence of an inductive invariant guarantees that the conjunction of all synthesized 
lower bounds for i is large enough to prove termination, and thus Algorithm 1 
converges. Otherwise, if grammar G still contains a formula that has not been 
considered yet, the synthesis loop iterates. 

For the progress of the algorithm, it must keep track of the strength of each new 
candidate cand. That is, cand should add more restrictions on i in Init. Otherwise, 
the outcome of the validity check (line 9) would be the same as in the previous iter- 
ation. For this reason, Algorithm 1 includes an important routine [16]: after each 
sampled candidate cand, it adjusts the probability distributions associated with 
the grammar, such that cand could not be sampled again in the future iterations 
(line 6). Additionally, it checks (line 7) if a new constraint adds some value over 
the already accepted constraints. Consequently, our algorithm does not require 
explicit handing of counterexamples: if in each iteration Init gets only stronger 
then current cez is invalidated. While in principle the algorithm could explicitly 
store cer and check its consistency with each new cand, however in our experi- 
ments it did not lead to significant performance gains. 


Theorem 1. If Algorithm 1 returns TERMINATES for program P, then P termi- 
nates. 


Indeed, the verification condition, which is proven safe in the last iteration of 
Algorithm 1, corresponds to some program P’ that differs from P by the presence 
of variable i. The set of traces of P has a one-to-one correspondence with the 


3 Assume that initially set V does not contain i. 
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Algorithm 2. LEXRANK(P): proving termination with lexicographic ter- 
mination argument 
Input: P= (V U V', Init, Tr) where Tr = Guard ^ Body 
Output: res € (TERMINATES, UNKNOWN) 
1VeVUlüjb Ve vue) 
2 Err — Guard ^i «0; jBounds — Ø; 
3 G,G', G” — GETGRAMMARANDDISTRIBUTIONS(Init, Tr); 
4 while CANSAMPLE(G) or CANSAMPLE(G") or CANSAMPLE(G") do 
5 if NONDET() then 
6 cand — SAMPLE(G, SUM); G — ADJUST(G, cand); 
7 Init — Init ^i > cand; 
8 if NONDET() then 
9 


cand —— SAMPLE(G', SUM); G” 


— ADJUST(G', cand); 


10 Init — Init ^ j > cand; 

11 if NONDET() then 

12 cand —— SAMPLE(G”,suM); G” — ADJUST(G", cand); 

13 jBounds — jBounds U (j > cand}; 

i4 Tre Tr^ite(j 0, 2i^j —j—1li -i-1^ A by 
bejBounds 

15 if IsSSAFE(Init, Tr’, Err) then return TERMINATES; 


16 return UNKNOWN; 


set of traces of P’, such that each state reachable in P could be extended by a 
valuation of i to become a reachable state in P’. That is, P terminates iff P’ 
terminates, and P' terminates by construction: i is initially assigned a reasonably 
large value, monotonically decreases at each iteration, and never goes negative. 

We note that the loop in Algorithm 1 always executes only a finite number of 
iterations since G is constructed from the finite number of components, and in 
each iteration it gets adjusted to avoid re-sampling of the same candidates. How- 
ever, an off-the-shelf Horn solver that checks validity of each candidate might not 
converge because the safety verification task is undecidable in general. To mit- 
igate this obstacle, our implementation supports several state-of-the-art solvers 
and provides a flexibility to specify one to use. 


4.2 Synthesizing Lexicographic Termination Arguments 


There is a wide class of terminating programs for which no linear termination 
argument exists. A commonly used approach to handle them is via a search for 
a so-called lexicographic termination argument that requires introducing two or 
more extra counters. A SyGuS-based instantiation of such a procedure for two 
counters is shown in Algorithm 2 (more counters could be handled similarly). 
Algorithm 2 has a similar structure to Algorithm 1: the initial program gets aug- 
mented by counters, formula Err is introduced, lower bounds for counters are 
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iteratively sampled and added to Init and Tr, and the verification condition is 
checked for safety. 

The differences in Algorithm 2 are in how it handles two counters 7 and j, 
between which an implicit order is fixed. In particular, Err is still expressed over i 
only, but i gets decremented by one only when j equals zero (line 14). At the same 
time, j gets updated in each iteration: if it was equal to zero, it gets assigned 
a value satisfying the conjunction of constraints in an auxiliary set jBounds; 
otherwise it gets decremented by one. Algorithm 2 synthesizes jBounds as well as 
lower bounds for initial conditions over i and j. The sampling proceeds separately 
from three different grammars (lines 6, 9, and 12), and the samples are used in 
three different contexts (lines 7, 10, and 13 respectively). Optionally, Algorithm 2 
could be parametrized by a synthesis strategy that gives interpretations for each 
of the NONDET() calls (lines 5, 8, and 11 respectively). In the simplest case, each 
NONDET() call is replaced by T, which means that in each iteration Algorithm 2 
needs to sample from all three grammars. Alternatively, NONDET() could be 
replaced by a method to identify only one grammar per iteration to be sampled 
from. 


Theorem 2. If Algorithm 2 returns TERMINATES for program P, then P termi- 
nates. 


The proof sketch for Theorem 2 is similar to the one for Theorem 1: an aug- 
mented program P’ terminates by construction (due to a mapping of values of 
(i, j) into ordinals), and its set of traces has a one-to-one correspondence with 
the set of traces of P. 


5 Proving Non-termination 


In this section, we aim at solving the opposite task to the one in Sect. 4, i.e., 
we wish to witness infinite program traces and thus, to prove program non- 
termination. However, in contrast to a traditional search for a single infinite 
trace, it is often easier to search for groups of infinite traces. 


Lemma 1. Program P = (V U V', Init, Tr) where Tr = Guard ^ Body does not 
terminate if: 


1. there exists a state s, such that s E- Init and s = Guard, 
2. for every state s, such that s E- Guard, there exists a state s', such that 
s,s’ || Tr and s' = Guard. 


The lemma distinguishes a class of programs, for which the following holds. 
First, the loop guard is reachable from the set of initial states. Second, whenever 
the loop guard is satisfied, there exists a transition to a state in which the loop 
guard is satisfied again. Therefore, each initial state s, from which the loop guard 
is reachable, gives rise to at least one infinite trace that starts with s. 

Note that for programs with deterministic transition relations (like, e.g., in 
Fig. 1a), the check of the second condition of Lemma 1 reduces to deciding the 
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while (y != kK) { Ve,y,Kk .jJa<KAy<KA(y<KVy>K)| = 


x = (x >K) ?x- 


C * )? x+ 


de’,y', K. K' = K^ 
soa xv’ =ite (x >K,x—-1,r+1Vz))A 
y= (y>x) ?y- y =ite (y>a’,y—l,ite (y<a',y+l1,y))A 


(y <x)? y+ 


PP rere 


x < K'ny <K'A(y < K'vy >K') 


(a) (b) 


(c) 


Fig. 3. (a): A variant of program from Fig. la; (b): the valid Y3-formula for its non- 
terminating refinement (in frameboxes — refined Guard-s); (c): an example of a non- 
terminating dynamics, when value of x (and eventually, y) never gets changed. 


satisfiability of a quantifier-free formula since each state can be transitioned to 
exactly one state. But if the transition relation is non-deterministic, the check 
reduces to deciding validity of a V3-formula. Although handling quantifiers is in 
general hard, some recent approaches [15] are particularly tailored to solve this 
type of queries efficiently. 

In practice, the conditions of Lemma 1 are too strict to be fulfilled for an arbi- 
trary program. However, to prove non-termination, it is sufficient to constrain 
the transition relation as long as it preserves at least one original transition and 
only then to apply Lemma 1. 


Definition 4. Given programs P = (V U V',Init, Tr), and P' = (VU 
V’, Init, Tr’), we say that P' is a refinement of P if Tr’ => Tr. 


Intuitively, Definition 4 requires P and P’ to operate over the same sets of 
variables and to start from the same initial states. Furthermore, each transition 
allowed by Tr’ is also allowed by Tr. One way to refine P is to restrict Tr = 
Guard ^ Body by conjoining either Guard, or Body, or both with some extra 
constraints (called refinement constraints). In this work, we propose to sample 
them from our automatically constructed formal grammar (recall Sect. 3). 


Example 2. Consider a program shown in Fig. 3a. It differs from the one shown 
in Fig. 1a by a non-deterministic choice in the second ite-statement. That is, y 
still moves towards x; but x moves towards K only when x > K, and otherwise 
x may always keep the initial value. The formal grammar generated for this 
program is the same as shown in Fig. 1d, and it contains constraints x < K 
and y < K. Lemma 1 does not apply for the program as is, but it does after 
refining Guard with those constraints. In particular, the V3-formula in Fig. 3b is 
valid, and a witness to its validity is depicted in Fig.3c: eventually both x and 
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Algorithm 3. NONTERMREF(P): proving non-termination 


Input: P= (V U V', Init, Tr) where Tr = Guard ^ Body 
Output: res € (TERMINATES, DOES NOT TERMINATE, UNKNOWN) 


mm 


if Init(V) ^ Guard(V) = > L then return TERMINATES; 


Tr — Tr ^ GETINVS(Init, Tr); 
G —— GETGRAMMARANDDISTRIBUTIONS( Init, Tr); 
Refs — Ø; | Gramms — Ø; | Gramms.PUSH(G); 


rR ON 


5 while true do 
6 if YV . Guard(V)A A r(V) = 


r€ Refs 
AV’. Body(V, V) A Guard(V') ^ A r(V^) then 
r€ Refs 
7 return DOES NOT TERMINATE; 
8 cand — T; 
9 while Guard(V) A A r(V) = > cand(V) or 
r€ Refs 
Init(V) ^ Guard(V) ^ cand(V) ^ A r(V) => 1 do 
r€ Refs 
10 if Refs = Ø and ^CANSAMPLE(G) then return UNKNOWN; 
11 if Refs Z Ø and ^CANSAMPLE(G) then 
12 Refs.PoP(); 
13 Gramms.POP(); 
14 cand — T; G — Gramms.TOP(); 
15 continue; 
16 cand — SAMPLE(G, INEQ); 
17 G — ADJUST(G, cand); 
18 Refs.PUSH(cand); 
19 Gramms.PUSH(G); 


y become equal and always remain smaller than K. Thus, the program does not 
terminate. 


5.1 Synthesizing Non-terminating Refinements 


The algorithm for proving program’s non-termination is shown in Algorithm 3. 
It starts with a simple satisfiability check (line 1) which filters out programs that 
never reach the loop body (thus they immediately terminate). Then, the tran- 
sition relation Tr gets strengthened by auxiliary inductive invariants obtained 
with the help of the initial states Init (line 2). The algorithm does not impose any 
specific requirements on the invariants (and it is sound even for a trivial invariant 
T) and on a method that detects them. In many cases, auxiliary invariants make 
the algorithm converge faster. Similar to Algorithms 1-2, Algorithm 3 splits Init 
and Tr to a set of formulas and generalizes them to a grammar. The difference 
lies in the type of formulas sampled from the grammar (INEQ vs SUM) and their 
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use in the synthesis loop: Algorithm 3 treats sampled candidates as refinement 
constraints and attempts to apply Lemma 1 (line 6). 


The algorithm maintains a stack of refinement constraints Refs. At the first 
iteration, Refs is empty, and thus the algorithm tries to apply Lemma 1 to the 
original program. For that application, a V3-formula is constructed and checked 
for validity. Intuitively the formula expresses the ability of Body to transition 
each state which satisfies Guard to a state which satisfies Guard as well. If the 
validity of V3-formula is proven, the algorithm converges (line 7). Otherwise, a 
refinement of P needs to be guessed. Thus, the algorithm samples a new formula 
(line 16) using the production rule INEQ, which is described in Sect. 3, pushes it 
to Refs, and iterates. Note that G permits formulas over V only (i.e., to restrict 
Guard), however, in principle it can be extended for sampling formulas over 
V U V’ (thus, to restrict Body as well). 

For the progress of the algorithm, it must keep track of how each new can- 
didate cand corresponds to constraints already belonging to Refs. That is, cand 
should not be implied by Guard ^ /A r since otherwise the V3-formula in the 

rCRefs 

next iteration would not change. Me d should not over-constrain the loop 
guard, and thus it is important to check that after adding cand to constraints 
from Guard and Refs, the loop guard is still reachable from the initial states. 
Both these checks are performed before the sampling (line 9). After the sam- 
pling, necessary adjustments on the probability distributions, assigned to the 
production rules of the grammar [16], are applied to ensure the same refinement 
candidates are not re-sampled again (line 17). 

Because by construction G cannot generate conjunctions of constraints, the 
algorithm handles conjunctions externally. It is useful in case when a single con- 
straint is not enough for application of Lemma 1, and it should be strengthened 
by another constraint. On the other hand, it also might be needed to withdraw 
some sampled candidates before converging. For this reason, Algorithm 3 main- 
tains a stack Gramms of grammars and handles it synchronously with stack Refs 
(lines 12-14 and 18-19). When all candidates from a grammar were considered 
and were unsuccessful, the algorithm pops the latest candidate from Refs and 
rolls back to the grammar used in the previous iteration. Additionally, a maxi- 
mum size of Refs can be specified to avoid considering too deep refinements. 


Theorem 3. /f Algorithm 3 returns DOES NOT TERMINATE for program P, then 
P does not terminate. 


Indeed, constraints that belong to Refs in the last iteration of the algorithm 
give rise to a refinement P’ of P, such that P' = (VU V', Init, Tr ^ A r). 
r€ Refs 
The satisfiability check (line 9) and the validity check (line 6) passed, which 
correspond to the conditions of Lemma 1. Thus, P' does not terminate, and 
consequently it has an infinite trace. Finally, since P’ refines P then all traces 
(including infinite ones) of P' belong to P, and P does not terminate as well. 
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5.2 Integrating Algorithms Together 


With a few exceptions [30,39], existing algorithms address either the task of 
proving, or the task of disproving termination. The goal of this paper is to show 
that both tasks benefit from syntax-guided techniques. While an algorithmic 
integration of several orthogonal techniques is itself a challenging problem, it 
is not the focus of our paper. Still, we use a straightforward idea here. Since 
each presented algorithm has one big loop, an iteration of Algorithm 1 could be 
followed by an iteration of Algorithm 2 and in turn, by an iteration of Algorithm 3 
(i.e., in a lockstep fashion). A positive result obtained by any algorithm forces 
all remaining algorithms to terminate. Based on our experiments, provided in 
detail in Sect. 6, the majority of benchmarks were proven either terminating or 
non-terminating by one of the algorithms within seconds. This justifies why the 
lockstep execution of all algorithms in practice would not bring a significant 
overhead. 


6 Evaluation 


We have implemented algorithms for proving termination and non-termination 
in a tool called FREQTERM?". It is developed on top of FREQHORN [16], uses it 
for Horn solving, and supports other Horn solvers, SPACER3 [26] and uZ [24], 
as well. To solve V3-formulas, FREQTERM uses the AE-VAL tool [15]. All the 
symbolic reasoning in the end is performed by the Z3 SMT solver [11]. 
FREQTERM takes as input a program encoded as a system of linear con- 
strained Horn clauses (CHC). It supports any programming language, as long 
as a translator from it to CHCs exists. For encoding benchmarks to CHCs, we 
used SEAHORN v.0.1.0-rc3. To the best of our knowledge, FREQ'TERM is the 
only (non)-termination prover that supports a selection of Horn solvers in the 
backend. This allows the prover to leverage advancements in Horn solving easily. 
We have compared FREQTERM against APROVE rev. c181f40 [18], ULTI- 
MATE AUTOMIZER v.0.1.23 [22], and HipPTNT-+ v.1.0 [30]. The rest of the section 
summarizes three sets of experiments. Sections 6.1 and 6.2 discuss the compari- 
son on small but tricky programs, respectively terminating and non-terminating, 
which shows that our approach is applicable to a wide range of conceptually chal- 
lenging problems. In Sect. 6.3, we target several large-scale benchmarks and show 
that FREQ'TERM is capable of significant pushing the boundaries of termination 
and non-termination proving. In total, we considered 856 benchmarks of various 
size and complexity. All experiments were conducted on a Linux SMP machine, 
Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40 GHz, 56 CPUs, 377 GB RAM. 


6.1 Performance on Terminating Benchmarks 


We considered 171 terminating programs? from the Termination category of 
SVCOMP and programs crafted by ourselves. Altogether, four tools in our exper- 
iment were able to prove termination of 168 of them within a timeout of 60s and 


* The source code of the tool is publicly available at https://goo.gl/HecBWce. 
5 These benchmarks are available at https://goo.gl/MPimXE. 
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(b) non-terminating examples (176) 


Fig. 4. FREQTERM vs respectively ULTIMATE AUTOMIZER, APROVE, and HiPTNT-4. 


left only three programs without a verdict. APROVE verified 76 benchmarks, 
HiPTNT-4- 90 (including 3 that no other tool solved), ULTIMATE AUTOMIZER 
105 (including 4 that no other tool solved). FREQTERM, implementing Algo- 
rithms 1-2 and relying on different solvers verified in total 155 (including 30 
that no other tool solved). In particular, Algorithm 1 instantiated with SPACER3, 
proved termination of 88 programs, with wZ 79, and with FREQHORN 80. Algo- 
rithm 2 instantiated with SPACER3, proved termination of 92 programs, with uZ 
109, and with FREQHORN 74. 

A scatterplot with logarithmic scale on the axes in Fig. 4(a) shows compar- 
isons of best running times of FREQTERM vs the running times of competing 
tools. Each point in a plot represents a pair of the FREQTERM run (x-axis) and 
the competing tool run (y-axis). Intuitively, green points represent cases when 
FREQTERM outperforms the competitor. On average, for programs solved by 
both FREQTERM and ULTIMATE AUTOMIZER, FREQTERM is 29 times faster 
(speedup calculated as a ratio of geometric means of the corresponding runs). 
In a similar setting, FREQTERM is 32 times faster than APROVE. However, 
FREQTERM is 2 times slower than HiP'TN'T4-. The evaluation further revealed 
(in Sect. 6.3) that the latter tool is efficient only on small programs (around 10 
lines of code each), and for large-scale benchmarks it exceeds the timeout. 
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6.2 Performance on Non-terminating Benchmarks 


We considered 176 terminating programs? from the Termination category of 
SVCOMP and programs crafted by ourselves. Altogether, four tools proved 
non-termination of 172 of them: APROVE 35, HiPTN' T-4- 92, ULTIMATE 
AUTOMIZER 123, and Algorithm 3 implemented in FREQTERM 152. Addition- 
ally, we evaluated the effect of V3-solving in FREQTERM. For that reason, we 
implemented a version of Algorithm 3 in which non-termination is reduced to 
safety, but the conceptual SyGuS-based refinement generator remained the same. 
This implementation used SPACER3 for proving that the candidate refinement 
can never exit the loop. Among 176 benchmarks, such routine solved only 105, 
which is 30% fewer than Algorithm 3. However, it managed to verify 8 bench- 
marks that Algorithm 3 could not verify (we believe, because SPACER3 was able 
to add an auxiliary inductive invariant). 

Logarithmic scatterplot in Fig. 4(b) shows comparisons of FREQTERM vs the 
running times of competing tools. On average, FREQTERM is 41 times faster than 
ULTIMATE AUTOMIZER, 73 times faster than APROVE, and exhibits roughly 
similar runtimes to HIPTN'T-- (again, here we considered only programs solved 
by both tools). Based on these experiments, we conclude that currently FREQ- 
TERM is more effective and more efficient at synthesizing non-terminating pro- 
gram refinements than at synthesizing terminating arguments. 


6.3 Large-Scale Benchmarks 


We considered some large-scale benchmarks for evaluation arising from Event- 
Condition-Action (ECA) systems that describe reactive behavior [1]. We consid- 
ered various modifications of five challenging ECAs’. Each ECA consists of one 
large loop, where each iteration reads an input and modifies its internal state. 
If an unexpected input is read, the ECA terminates. 

In our first case study, we aimed to prove non-termination of the given ECAs, 
i.e., that for any reachable internal state there exists an input value that would 
keep the ECA alive. The main challenge appeared to be in the size of benchmarks 
(up to 10000 lines of C code per loop) and reliance on an auxiliary inductive 
invariant. With the extra support of SPACER3 to provide the invariant, FRE- 
QTERM was able to prove non-termination of a wide range of programs. Among 
all the competing tools, only ULTIMATE AUTOMIZER was able to handle these 
benchmarks, but it verified only a small fraction of them within a 2 h timeout. In 
contrast, FREQTERM solved 301 out of 302 tasks and outperformed ULTIMATE 
AUTOMIZER by up to several orders of magnitude (i.e., from seconds to hours). 
Table 1 contains a brief summary of our experimental evaluation.? 

In our second case study, we instrumented the ECAs by adding extra condi- 
tions to the loop guards, thus imposing an implicit upper bound on the number 


ê These benchmarks are available at https://goo.gl/bZbuA2. 
7 These benchmarks are available at https://goo.gl/7mc2Ww. 
8 To calculate average timings, we excluded cases when the tool exceeded timeout. 
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Table 1. FREQTERM vs ULTIMATE AUTOMIZER on non-terminating ECAs (302). 


Benchmarks FREQTERM ULTIMATE AUTOMIZER 
Class | # of tasks | Avg # of LoC | # solved | Avg time | # solved | Avg time 

1 & 2/122 500 122 5 sec 3 27 min 

3 60 1600 60 56 sec 0 oo 

4 60 4700 60 9 min 6 82 min 

5 60 10000 59 52 min 0 oo 


Table 2. FREQTERM vs ULTIMATE AUTOMIZER on terminating ECAs (207). 


Benchmarks FREQTERM ULTIMATE AUTOMIZER 
Class | # of tasks | Avg # of LoC | # solved | Avg time | # solved | Avg time 
1&2 97 500 97 8 sec 96 T3 sec 

3 40 1600 40 3min 12 56 min 

4 35 4700 35 10 min 2T 19 min 

5 35 10000 34 65 min 19 99 min 


of loop iterations, and applied tools to prove termination? (shown in Table 2). 
Again, only ULTIMATE AUTOMIZER was able to compete with FREQTERM, and 
interestingly it was more successful here than in the first case study. Encourag- 
ingly, FREQTERM solved all but one instance and was consistently faster. 


7 Related Work 


Proving Termination. A wide range of state-of-the-art methods are based on iter- 
ative reasoning driven by counterexamples [4,5,9,10,19,21,23,27,29,36] whose 
goal is to show that transitions cannot be executed forever. These approaches 
typically combine termination arguments, proven independently, but none of 
them leverages the syntax of programs during the analysis. 

A minor range of tools of termination analyzers are based on various types 
of learning. In particular, [39] discovers a terminating argument from attempts 
to prove that no program state is terminating; [34] exploits information derived 
from tests, [37] guesses and checks transition invariants (over-approximations to 
the reachable transitive closure of the transition relation) from libraries of tem- 
plates. The closest to our approach, [31] guesses and checks transition invariants 
using loop guards and branch conditions. In contrast, our algorithms guess lower 
bounds for auxiliary program counters and extensively use all available source 
code for guessing candidates. 


? The task of adding interesting guards appeared to be non-trivial, so we were able to 
instrument only a part of all non-terminating benchmarks. 
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Proving Non-termination. Traditional algorithms, e.g. [3,6,8,20,22], are based 
on a search for lasso-shaped traces and a discovery of recurrence sets, i.e., states 
that are visited infinitely often. For instance, [32] searches for a geometric series 
in lasso-shaped traces. Our algorithm discovers existential recurrence sets and 
does not deal with traces at all: it handles their abstraction via a V3-formula. 

A reduction to safety attracts significant attention here as well. In particu- 
lar, [40] relies only on invariant generation to show that the loop guard is also 
satisfied, [19] infers weakest preconditions over inputs, under which program is 
non-terminating; and [7,28] iteratively eliminate terminating traces through a 
loop by adding extra assumptions. In contrast, our approach does not reduce to 
safety, and thus does not necessarily require invariants. However, we observed 
that if provided, in practice they often accelerate our verification process. 


Syntax-Guided Synthesis. SyGuS [2] is applied to various tasks related to pro- 
gram synthesis, e.g., [13,17,25,33,35,41]. However, the formal grammar in those 
applications is typically given or constructed from user-provided examples. To 
the best of our knowledge, the only application of SyGuS to automatic pro- 
gram analysis was proposed by [14,16], and it inspired our approach. Originally, 
the formal grammar, constructed from the verification condition, was iteratively 
used to guess and check only inductive invariants. In this paper, we showed that 
a similar reasoning is practical and easily transferable across applications. 


8 Conclusion 


We have presented new algorithms for synthesis of termination arguments and 
non-terminating program refinements. Driven by SyGuS, they iteratively gen- 
erate candidate formulas which tend to follow syntactic patterns obtained from 
the source code. By construction, the number of possible candidates is always 
finite, thus the search space is always relatively small. The algorithms rely on 
recent advances in constraint solving, they do not depend on a particular backend 
engine, and thus performance of checking validity of a candidate can be improved 
by advancements in solvers. Our implementation FREQTERM is evaluated on a 
wide range of terminating and non-terminating benchmarks. It is competitive 
with state-of-the-art and it significantly outperforms other tools when proving 
non-termination of large-scale Event-Condition-Action systems. 

In future work, it would be interesting to investigate synergetic ways of inte- 
grating the proposed algorithms together, as well as exploiting strengths of dif- 
ferent backend Horn solvers for different verification tasks. 
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Abstract. Hyperproperties are properties of sets of computation traces. 
In this paper, we study quantitative hyperproperties, which we define as 
hyperproperties that express a bound on the number of traces that may 
appear in a certain relation. For example, quantitative non-interference 
limits the amount of information about certain secret inputs that is 
leaked through the observable outputs of a system. Quantitative non- 
interference thus bounds the number of traces that have the same observ- 
able input but different observable output. We study quantitative hyper- 
properties in the setting of HyperLTL, a temporal logic for hyperproper- 
ties. We show that, while quantitative hyperproperties can be expressed 
in HyperLTL, the running time of the HyperLTL model checking algo- 
rithm is, depending on the type of property, exponential or even doubly 
exponential in the quantitative bound. We improve this complexity with 
a new model checking algorithm based on model-counting. The new algo- 
rithm needs only logarithmic space in the bound and therefore improves, 
depending on the property, exponentially or even doubly exponentially 
over the model checking algorithm of HyperLTL. In the worst case, the 
new algorithm needs polynomial space in the size of the system. Our 
Max#Sat-based prototype implementation demonstrates, however, that 
the counting approach is viable on systems with nontrivial quantitative 
information flow requirements such as a passcode checker. 


1 Introduction 


Model checking algorithms [17] are the cornerstone of computer-aided verifica- 
tion. As their input consists of both the system under verification and a logical 
formula that describes the property to be verified, they uniformly solve a wide 
range of verification problems, such as all verification problems expressible in 
linear-time temporal logic (LTL), computation-tree logic (CTL), or the modal 
p-calculus. Recently, there has been a lot of interest in extending model checking 
from standard trace and tree properties to information flow policies like obser- 
vational determinism or quantitative information flow. Such policies are called 
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hyperproperties [21] and can be expressed in HyperLTL [18], an extension of LTL 
with trace quantifiers and trace variables. For example, observational determin- 
ism [47], the requirement that any pair of traces that have the same observable 
input also have the same observable output, can be expressed as the following 
HyperLTL formula: Va.Va'. (Or =; T’) —^ (Or =o 7’) For many information 
flow policies of interest, including observational determinism, there is no longer 
a need for property-specific algorithms: it has been shown that the standard 
HyperLTL model checking algorithm [26] performs just as well as a specialized 
algorithm for the respective property. 

The class of hyperproperties studied in this paper is one where, by contrast, 
the standard model checking algorithm performes badly. We are interested in 
quantitative hyperproperties, i.e., hyperproperties that express a bound on the 
number of traces that may appear in a certain relation. A prominent exam- 
ple of this class of properties is quantitative non-interference [43,45], where 
we allow some flow of information but, at the same time, limit the amount 
of information that may be leaked. Such properties are used, for example, to 
describe the correct behavior of a password check, where some information 
flow is unavoidable (“the password was incorrect"), and perhaps some extra 
information flow is acceptable (“the password must contain a special charac- 
ter"), but the information should not suffice to guess the actual password. In 
HyperLIL, quantitative non-interference can be expressed [18] as the formula 
Yro. Yra... Vrae. (N; Umi =r To)) ^ (Vig DX; =o m3) . The formula states 
that there do not exist 2°+ 1 traces (corresponding to more than c bits of infor- 
mation) with the same observable input but different observable output. The 
bad performance of the standard model checking algorithm is a consequence of 
the fact that the 2° + 1 traces are tracked simultaneously. For this purpose, the 
model checking algorithm builds and analyzes a (2° + 1)-fold self-composition of 
the system. 

We present a new model checking algorithm for quantitative hyperproper- 
ties that avoids the construction of the huge self-composition. The key idea of 
our approach is to use counting rather than checking as the basic operation. 
Instead of building the self-composition and then checking the satisfaction of 
the formula, we add new atomic propositions and then count the number of 
sequences of evaluations of the new atomic propositions that satisfy the specifi- 
cation. Quantitative hyperproperties are expressions of the following form: 


Vmi....Vmy. iq > (#0 : X. an), 


where < € {<,<,>,>,=}. The universal quantifiers introduce a set of reference 
traces against which other traces can be compared. The formulas y and ~w are 
HyperLTL formulas. The counting quantifier #0 : X.w counts the number of 
paths c with different valuations of the atomic propositions X that satisfy Y. The 
requirement that no more than c bits of information are leaked is the following 
quantitative hyperproperty: 


Vr. #0: O. O(a =r 0) € 2° 
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As we show in the paper, such expressions do not change the expressiveness of 
the logic; however, they allow us to express quantitative hyperproperties in expo- 
nentially more concise form. The counting-based model checking algorithm then 
maintains this advantage with a logarithmic counter, resulting in exponentially 
better performance in both time and space. 

The viability of our counting-based model checking algorithm is demon- 
strated on a SAT-based prototype implementation. For quantitative hyperprop- 
erties of intrest, such as bounded leakage of a password checker, our algorithm 
shows promising results, as it significantly outperforms existing model checking 
approaches. 


1.1 Related Work 


Quantitative information-flow has been studied extensively in the literature. See, 
for example, the following selection of contributions on this topic: [1, 14, 19,32, 
34,43]. Multiple verification methods for quantitative information-flow were pro- 
posed for sequential systems. For example, with static analysis techniques [15], 
approximation methods [35], equivalence relations [3,22], and randomized meth- 
ods [35]. Quantitative information-flow for multi-threaded programs was consid- 
ered in [11]. 

The study of quantitative information-flow in a reactive setting gained a 
lot of attention recently after the introduction of hyperproperties [21] and the 
idea of verifying the self-composition of a reactive system [6] in order to relate 
traces to each other. There are several possibilities to measure the amount of 
leakage, such as Shannon entropy [15,24,37], guessing entropy [3,34], and min- 
entropy [43]. A classification of quantitative information-flow policies as safety 
and liveness hyperproperties was given in [46]. While several verification tech- 
niques for hyperproperties exists [5,31,38,42], the literature was missing general 
approaches to quantitative information-flow control. SecLTL [25] was introduced 
as first general approach to model check (quantitative) hyperproperties, before 
HyperLTL [18], and its corresponding model checker [26], was introduced as a 
temporal logic for hyperproperties, which subsumes the previous approaches. 

Using counting to compute the number of solutions of a given formula is stud- 
ied in the literature as well and includes many probabilistic inference problems, 
such as Bayesian net reasoning [36], and planning problems, such as computing 
robustness of plans in incomplete domains [40]. State-of-the-art tools for propo- 
sitional model counting are Relsat [33] and c2d [23]. Algorithms for counting 
models of temporal logics and automata over infinite words have been intro- 
duced in [27,28,44]. 'The counting of projected models, i.e., when some parts of 
the models are irrelevant, was studied in [2], for which tools such as #CLASP [2] 
and DSharp.P [2,41] exist. Our SAT-based prototype implementation is based on 
a reduction to a Max#SAT [29] instance, for which a corresponding tool exists. 

Among the already existing tools for computing the amount of information 
leakage, for example, QUAIL [8], which analyzes programs written in a specific 
while-language and LeakWatch [12], which estimates the amount of leakage in 
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Java programs, Moped-QLeak [9] is closest to our approach. However, their app- 
roach of computing a symbolic summary as an Algebraic Decision Diagram is, in 
contrast to our approach, solely based on model counting, not maximum model 
counting. 


2 Preliminaries 


2.1 HyperLTL 


HyperLTL [18] extends linear-time temporal logic (LTL) with trace variables 
and trace quantifiers. Let AP be a set of atomic propositions. A trace t is an 
infinite sequence over subsets of the atomic propositions. We define the set of 
traces TR :— (2^P)". A subset T C TR is called a trace property and a subset 
H C 27F is called a hyperproperty. We use the following notation to manipulate 
traces: let t € TR be a trace and i € N be a natural number. ti] denotes the 
i-th element of t. Therefore, t[0] represents the starting element of the trace. Let 
j € N and j > i. t[i, j] denotes the sequence £[i] tli + 1]... tj — 1] tly]. t[2, oc] 
denotes the infinite suffix of t starting at position i. 


HyperLTL Syntax. Let Y be an infinite supply of trace variables. The syntax of 
HyperLTL is given by the following grammar: 


p = dnw | Vx | v 
p = ar | =Y | Ve | Oy | pup 


where a € AP is an atomic proposition and m € Y is a trace variable. Note 
that atomic propositions are indexed by trace variables. The quantification over 
traces makes it possible to express properties like “on all traces ~ must hold", 
which is expressed by Vz. wv. Dually, one can express that “there exists a trace 
such that v holds", which is denoted by dz. v. The derived operators ©, D, 
and W are defined as for LTL. We abbreviate the formula Ac (tr — x"), 
expressing that the traces 7 and 7’ are equal with respect to a set X C AP of 
atomic propositions, by 7 =x 7’. Furthermore, we call a trace variable 7 free in 
a HyperLTL formula if there is no quantification over 7 and we call a HyperLTL 
formula y closed if there exists no free trace variable in q. 


HyperLTL Semantics. A HyperLTL formula defines a hyperproperty, i.e., a set 
of sets of traces. A set T' of traces satisfies the hyperproperty if it is an element 
of this set of sets. Formally, the semantics of HyperLTL formulas is given with 
respect to a trace assignment IT from Y to TR, i.e., a partial function mapping 
trace variables to actual traces. I/[y — t| denotes that m is mapped to t, with 
everything else mapped according to IT. II[i,oo] denotes the trace assignment 
that is equal to JI (r)[i, oc] for all s. 
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Il Ep ary iff there exists t€ T : [rot] =r v 
IH ET Va iff foralteT : Il[»t|Erv 

Il ET ar iff a € II(x)[0) 

II E v iff Try 

IH Er wviVvv. iff II Fr V1 or II Er V» 

Il Er Ov if Hl,olery 

I =r p Uy iff there exists i > 0 : H[i, oo] Er v» 


and for all 0 € j < i we have II[j,oo] Er V1 


We say a set of traces T satisfies a HyperLTL formula q if H Er v, where II is 
the empty trace assignment. 


2.2 System Model 


A Kripke structure is a tuple K = (S, so, ô, AP, L) consisting of a set of states 
S, an initial state sọ € S, a transition function ô : S — 25, a set of atomic 
propositions AP, and a labeling function L : S — 24, which labels every state 
with a set of atomic propositions. We assume that each state has a successor, 
i.e., ó(s) Z Ø. This ensures that every run on a Kripke structure can always be 
extended to an infinite run. We define a path of a Kripke structure as an infinite 
sequence of states s951::- € S" such that so is the initial state of K and sj,1 € 
0(s;) for every i € N. We denote the set of all paths of K that start in a state s 
with Paths(K, s). Furthermore, Paths" (K,s) denotes the set of all path prefixes 
and Paths" (K, s) the set of all path suffixes. A trace of a Kripke structure is an 
infinite sequence of sets of atomic propositions L(so), L(s1), =- € (24?)”, such 
that so is the initial state of K and s;41 € 6(s;) for every i € N. We denote the 
set of all traces of K that start in a state s with T'R(K, s). We say that a Kripke 
structure K satisfies a HyperLTL formula q if its set of traces satisfies y, i.e., if 
IT E-TR(K,s) Y, where IT is the empty trace assignment. 


2.3 Automata over Infinite Words 


In our construction we use automata over infinite words. A Büchi automaton 
is a tuple B = (Q, Qo, 6, Z, F), where Q is a set of states, Qo is a set of initial 
states, 6: Q x X > 29 is a transition relation, and F C Q are the accepting 
states. A run of B on an infinite word w = a a2:-- € X" is an infinite sequence 
r = qoq1::: € Q“ of states, where qo € Qo and for each i > 0, qi+1 = ó(qi, 041). 
We define Inf(r) = {q € Q | Vij > i. rj = q}. A run r is called accepting if 
Inf(r) Nn F # 0. A word w is accepted by B and called a model of B if there is 
an accepting run of B on w. 

Furthermore, an alternating automaton, whose runs generalize from 
sequences to trees, is a tuple A = (Q, Qo, 0, X, F). Q, Qo, X, and F are defined 
as above and 6: Q x X — B* Q being a transition function, which maps a state 
and a symbol into à Boolean combination of states. Thus, a run(-tree) of an 
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alternating Büchi automaton A on an infinite word w is a Q-labeled tree. A 
word w is accepted by A and called a model if there exists a run-tree T such 
that all paths p trough T are accepting, i.e., Inf(p) n F z 0. 

A strongly connected component (SCC) in A is a maximal strongly connected 
component of the graph induced by the automaton. An SCC is called accepting 
if one of its states is an accepting state in A. 


3 Quantitative Hyperproperties 


Quantitative Hyperproperties are properties of sets of computation traces that 
express a bound on the number of traces that may appear in a certain relation. In 
the following, we study quantitative hyperproperties that are specified in terms 
of HyperLTL formulas. We consider expressions of the following general form: 


Yri,- Tk. P — (#0 : A. v an) 


Both the universally quantified variables 7,,...,7 and the variable c after the 
counting operator # are trace variables; y is a HyperLTL formula over atomic 
propositions AP and free trace variables 71...7,4,; A C AP is a set of atomic 
propositions; V is a HyperLTL formula over atomic propositions AP and free 
trace variables 74 ...7, and, additionally ø. The operator < € {<,<,=,>,>} is 
a comparison operator; and n € N is a natural number. 

For a given set of traces T and a valuation of the trace variables 7,..., Tk, 
the term #o : A. Y computes the number of traces c in T that differ in their val- 
uation of the atomic propositions in A and satisfy «v». The expression #0 : A. van 
is true iff the resulting number satisfies the comparison with n. Finally, the com- 
plete expression V71,...,7%. > (#0: A. v <n) is true iff for all combinations 
73,...,7: of traces in T that satisfy v, the comparison #o : A. ib an is satisfied. 


Example 1 (Quantitative non-interference). Quantitative information-flow poli- 
cies [13, 20,30, 34] allow the flow of a bounded amount of information. One way to 
measure leakage is with min-entropy [43], which quantifies the amount of infor- 
mation an attacker can gain given the answer to a single guess about the secret. 
The bounding problem [45] for min-entropy is to determine whether that amount 
is bounded from above by a constant 2°, corresponding to c bits. We assume that 
the program whose leakage is being quantified is deterministic, and assume that 
the secret input to that program is uniformly distributed. The bounding prob- 
lem then reduces to determining that there is no tuple of 2° + 1 distinguishable 
traces [43,45]. Let O C AP be the set of observable outputs. A simple quanti- 
tative information flow policy is then the following quantitative hyperproperty, 
which bounds the number of distinguishable outputs to 2°, corresponding to a 
bound of c bits of information: 


#o : O. true < 2° 


A slightly more complicated information flow policy is quantitative non- 
interference. In quantitative non-interference, the bound must be satisfied for 
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every individual input. Let / C AP be the observable inputs to the system. 
Quantitative non-interference is the following quantitative hyperproperty!: 


Vr. #0: O. (Olr 210)) € 2° 


For each trace 7 in the system, the property checks whether there are more than 
2° traces o that have the same observable input as m but different observable 
output. 


Example 2 (Deniability). A program satisfies deniability (see, for example, [7, 
10]) when there is no proof that a certain input occurred from simply observing 
the output, i.e., given an output of a program one cannot derive the input that 
lead to this output. A deterministic program satisfies deniability when each 
output can be mapped to at least two inputs. À quantitative variant of deniability 
is when we require that the number of corresponding inputs is larger than a given 
threshold. Quantitative deniability can be specified as the following quantitative 
Hyperproperty: 


Yr. #0: I. (Olr 20 0)) >n 


For all traces 7 of the system we count the number of sequences o in the system 
with different input sequences and the same output sequence of 7, i.e., for the 
fixed output sequence given by m we count the number of input sequences that 
lead to this output. 


4 Model Checking Quantitative Hyperproperties 


We present a model checking algorithm for quantitative hyperproperties based on 
model counting. The advantage of the algorithm is that its runtime complexity is 
independent of the bound n and thus avoids the n-fold self-composition necessary 
for any encoding of the quantitative hyperproperty in HYPERLTL. 

Before introducing our novel counting-based algorithm, we start by a trans- 
lation of quantitative hyperproperties into formulas in HYPERLTL and estab- 
lishing an exponential lower bound for its representation. 


4.1 Standard Model Checking Algorithm: Encoding Quantitative 
Hyperproperties in HyperLTL 


The idea of the reduction is to check a lower bound of n traces by existentially 
quantifying over n traces, and to check an upper bound of n traces by universally 
quantifying over n 4- 1 traces. The resulting HyperLTL formula can be verified 
using the standard model checking algorithm for HyperLTL [18]. 


1 We write 7 =a 7’ short for 74 = «4 where ma is the A-projection of r. 
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Theorem 1. Every quantitative hyperproperty Vni, ..., Tk. V, > (#0 : A. van) 
can be expressed as a HyperLTL formula. For a € {<}({<}), the HyperLTL 
formula has n + k + 1(resp. n + k) universal trace quantifiers in addition to 
the quantifiers in p, and y. For a € {>}({>}), the HyperLTL formula has k 
universal trace quantifiers and n (resp. n +1) existential trace quantifiers in 
addition to the quantifiers in v, and v. For a € {=}, the HyperLTL formula 
has k+ n+ 1 universal trace quantifiers and n existential trace quantifiers in 
addition to the quantifiers in p, and w. 


Proof. For a € {<}, we encode the quantitative hyperproperty Vz1,...,7:. Y, — 
(#0: A. w <n) as the following HyperLTL formula: 


YTI,- Nk VitisosesTIgax: d EA A O(n, XA m5) > (Vve = J 
izj i 

where ~[o +> 74] is the HyperLTL formula w with all occurrences of ø replaced 
by cj. The formula states that there is no tuple of n + 1 traces 7j,..., 75,44 
different in the evaluation of A, that satisfy v. In other words, for every n + 1 
tuple of traces 71,...,75,,, that differ in the evaluation of A, one of the paths 
must violate v. For a € {<}, we use the same formula, with Vz1,...,7;, instead 
of Vai. oe 

For a € {>}, we encode the quantitative hyperproperty analogously as the 
HyperLTL formula 


Vitis se TR. pues Te, > A^ ei XA T) ^ (Aste J 
izj i 

The formula states that there exist paths 71,...,7;, that differ in the evalua- 

tion of A and that all satisfy V». For 4 € {>}, we use the same formula, with 

Jmj,..., 75,44 instead of Yri, .-., Tp. Lastly, for 4 € {=}, we encode the quanti- 

tative hyperproperty as a conjunction of the encodings for < and for >. 


Example 3 (Quantitative non-interference in HyperLTL). As discussed in Exam- 
ple 1, quantitative non-interference is the quantitative hyperproperty 


Vn. #0: O. O(a =r 0) € 2°, 


where we measure the amount of leakage with min-entropy [43]. The bounding 
problem for min-entropy asks whether the amount of information leaked by a 
system is bounded by a constant 2° where c is the number of bits. This is encoded 
in HyperLTL as the requirement that there are no 2° + 1 traces distinguishable 
in their output: 


Yro. V1... Vac. (^ OE V (T; =o Tj) 


i izj 


'This formula is equivalent to the formalization of quantitative non-interference 
given in [26]. 
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Model checking quantitative hyperproperties via the reduction to HyperLTL 
is very expensive. In the best case, when a € {<, <}, v, does not contain exis- 
tential quantifiers, and ~ does not contain universal quantifiers, we obtain an 
HyperLTL formula without quantifier alternations, where the number of quan- 
tifiers grows linearly with the bound n. For m quantifiers, the HyperLTL model 
checking algorithm [26] constructs and analyzes the m-fold self-composition of 
the Kripke structure. The running time of the model checking algorithm is thus 
exponential in the bound. If a € {>,>,=}, the encoding additionally introduces 
a quantifier alternation. The model checking algorithm checks quantifier alterna- 
tions via a complementation of Büchi automata, which adds another exponent, 
resulting in an overall doubly exponential running time. 

The model checking algorithm we introduce in the next section avoids the 
n-fold self-composition needed in the model checking algorithm of HyperLTL 
and its complexity is independent of the bound n. 


4.2 Counting-Based Model Checking Algorithm 
A Kripke structure K = (S, so, r, AP, L) violates a quantitative hyperproperty 
P — Vmi,...,Tk. V, > (H0 : Apan) 


if there is a k-tuple t = (m,...,7,) of traces m; € TR(K) that satisfies the 
formula 


T1,-+-, Tk: Y, A (#0 : A. pan) 
where < is the negation of the comparison operator <. The tuple t then satisfies 
the property w, and the number of (k + 1)-tuples t = (m,...,7%,0) for o € 
TR(K) that satisfy w and differ pairwise in the A-projection of ø satisfies the 
comparison «4 n (The A-projection of a sequence ø is defined as the sequence 
oa € (24)”, such that for every position i and every a € A it holds that a € c afi] 
if and only if a € c [i]). The tuples t' can be captured by the automaton composed 
of the product of an automaton Ay, ay that accepts all k+1 of traces that satisfy 
both w, and w and a k + 1-self composition of K. Each accepting run of the 
product automaton presents k + 1 traces of K that satisfy 1», ^ ». On top of the 
product automaton, we apply a special counting algorithm which we explain in 
detail in Sect. 4.4 and check if the result satisfies the comparison < n. 
Algorithm 1 gives a general picture of our model checking algorithm. The 
algorithm has two parts. The first part applies if the relation < is one of (7, >}. 
In this case, the algorithm checks whether a sequence over AP, (propositions 
in w) corresponds to infinitely many sequences over A. This is done by checking 
whether the product automaton B has a so-called doubly pumped lasso(DPL), a 
subgraph with two connected lassos, with a unique sequence over APy and dif- 
ferent sequences over A. Such a doubly pumped lasso matches the same sequence 
over AP,, with infinitely many sequences over A (more in Sect. 4.4). If no dou- 
bly pumped lasso is found, a projected model counting algorithm is applied in 
the second part of the algorithm in order to compute either the maximum or 
the minimum value, corresponding to the comparison operator a. In the next 
subsections, we explain the individual parts of the algorithm in detail. 
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Algorithm 1. Counting-based Model Checking of Quantitative Hyperproperties 


Input: Quantitative Hyperproperty o = Vm...7k. Y, — (#0 : 


Structure K = (S, so, T, AP, L) 


Output: K Ey 


: /*Check Infinity*/ 
if ac {>,>} then 
ce = DPL(B) 
if ce Z | then 


return ce 


QoS Oy Bram ep Puer 


if ác {>, >} then 


: B = QHLTL2BA(K, m... Tk, Y AV) 


/* Apply Projected Counting Algorithm*/ 


9: ce = MazCount(B, n, «) 


10: else 


11: ce = MinCount(B, n, X) 


12: return ce 


A.w a n), Kripke 


4.3 Biichi Automata for Quantitative Hyperproperties 


For a quantitative hyperproperty y = Yri... nk. V, — (#0 : Ap <n) anda 
Kripke structure K = (S, so, T, AP, L), we first construct an alternating automa- 
ton Ay, Ay for the HYPERLTL property wv, ^ v. Let Ay, = (Q1, 90,1, X2, 61, F1) 
and Ay, = (Q2, qo,2, 22,05, F2) be alternating automata for subformulas 7, and 
we. Let X = 24Pe where AP, are all indexed atomic propositions that appear 
in y. Ay, Av is constructed using following rules?: 


P =ar Ag = ({q0}, qo; 2, 9, 0) where 9(qo, a) = (ax € o) 
g=-ar_ | Ag = (fado). qo. 2,9, 0) where 6(qo, a) = (ar € o) 
g = Vi ^2 Ag = (Q1 U Q2 U {40}, qo, 2,0, F1 U Fo) 
where 6(q, œ) = ó1(qo,1, @) ^ 92(q0,2, 0) 
and 6(q, a) = 6;(q,a) when q € Qi fori € (1,2) 
e = i V y» Ag = (Qı U Q2 U {qo}, qo, X, 6, F1 U F2) 
where 6(q, œ) = ó1(qo,1, &) V 92(q0,2, &) 
and 6(q, a) = 6;(q,a) when q € Q; for i € (1,2) 
e = O1 Ag = (Qi U {a0}, G0, 2,5, 11) 
where ó(q, o) = qo, 
and 6(q, a) = à1(q, a) for q € Qi 
e = iu vs Ay = (Q1 U Q2 U {q0}, qo, X, ô, Fi U Fo) 
where ó(qo, a) = ó2(q0,2, œ) V (01(q0,1, ^) ^ qo) 
and ó(q, o) = ói(q, o) when q € Qi for i € (1,2) 
e —41 X3 Ay = (Q1 U Q2U {q0}, q0, X, ô, F1 U F2 U {q0}) 
where ó(qo, o) = ó2(q0,2, œ) ^ (61(40,1, o) V qo) 
and ó(q, a) = 6;(q,a) when q € Q; for i € (1,2) 
For a quantified formula y = J-.w,, we construct the product 


automaton of 


the Kripke structure K and the Büchi automaton of Yı. Here we reduce the 
alphabet of the automaton by projecting all atomic proposition in AP, away: 


? The construction follows the one presented in [26] with a slight modification on 
the labeling of transitions. Labeling over atomic proposition instead of the states 
of the Kripke structure suffices, as any nondeterminism in the Kripke structure is 
inherently resolved, because we quantify over trace not paths. 
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e = 3m. Ay = (Qı x SU {qo}, X \ APr, ô, F1 x S) 
where 5(qo, a) = {(q’,8’) | q € 61(qo,1, aU a’), s' € T(s0), (L(so))« —Ar, o!) 
and 6((q, s), a) = {(q’, s") | q' € ôi (q,a U o), s' € r(s) (L(5))r —Ar, o) 


Given the Büchi automaton for the hyperproperty wv, Aw it remains to construct 
the product with the k+ 1-self composition of K. The transitions of the automa- 
ton are defined over labels from X = 247" where AP* = AP, U U; APs,. Ay, np- 
This is necessary to identify which transition was taken in each copy of K, thus, 
mirroring a tuple of traces in K. For each of the variables 71,...7; and o we 
use following rule: 


p = dry Ay = (Qi x SU {Go}, X, ð, F1 x S) 
where (qo, œ) = { (g, s") | a’ € 01(q0,1, @), 8’ € T(S0), (L(S0)) x =4Pr } 
and ô((q, s), a) = {(g', s") | à! € 1 (q,a), 8’ € 7(s), (L(s)) x =ar, Y 


Finally, we transform the resulting alternating automaton to an equivalent Büchi 
automaton following the construction of Miyano and Hayashi [39]. 


4.4 Counting Models of w-Automata 


Computing the number of words accepted by a Büchi automaton can be done 
by examining its accepting lassos. Consider, for example, the Büchi automata 
over the alphabet 21^? in Fig. 1. The automaton on the left has one accepting 
lasso (qo)^ and thus has only one model, namely (a)^. The automaton on the 
right has infinitely many accepting lassos (go{})’{a}(qi({} V {a}))* that accept 
infinitely many different words all of the from {}*{a}({} V {a})”. Computing 
the models of a Büchi automaton is insufficient for model checking quantitative 
hyperproperties as we are not interested in the total number of models. We rather 
maximize, respectively minimize, over sequences of subsets of atomic proposi- 
tions the mumber of projected models of the Büchi automaton. For instance, 
consider the automaton given in Fig. 2. The automaton has infinitely many mod- 
els. However, the maximum number of sequences o; € 21? that correspond to 
accepting lassos in the automaton with a unique sequence og € 2ta} is two: 
For example, let n be a natural number. For any model of the automaton and 
for each sequence c, :— ()"(a)(())" the automaton accepts the following two 
sequences: {b}”{}{b}” and (b)?. Formally, given a Büchi automaton B over AP 
and a set A, such that A C AP, an A-projected model (or projected model over 
A) is defined as a sequence c4 € (24)” that results in the A-projection of an 
accepting sequence o € (24?)¥, 


* 7a * 


a 
71a a 
co o EO 


Fig. 1. Büchi automata with one model (left) and infinitely many models (right). 
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aa ^b b 


a 
start (2) 


Fig. 2. A two-state Büchi automaton, such that there exist exactly two {b}-projected 
models for each {a}-projected sequence. 


In the following, we define the maximum model counting problem over 
automata and give an algorithm for solving the problem. We show how to use 
the algorithm for model checking quantitative hyperproperties. 


Definition 1 (Maximum Model Counting over Automata (MMCA)). 
Given a Büchi automaton B over an alphabet 24? for some set of atomic propo- 
sitions AP and sets X,Y,Z C AP the maximum model counting problem is to 
compute 


max |{ox € [pe | Joz € (279, ox Uoy Uoz € L(B)}| 
oy €(2Y )u 


where o Uo’ is the point-wise union of o and c'. 


As a first step in our algorithm, we show how to check whether the maximum 
model count is equal to infinity. 


Definition 2 (Doubly Pumped Lasso). For a graph G, a doubly pumped 
lasso in G is a subgraph that entails a cycles C4 and another different cycle C2 
that is reachable from C. 


oo 


Fig. 3. Forms of doubly pumped lassos. 


In general, we distinguish between two types of doubly pumped lassos as 
shown in Fig.3. We call the lassos with periods C4 and C5 the lassos of the 
doubly pumped lasso. A doubly pumped lasso of a Büchi automaton B is one in 
the graph structure of B. The doubly pumped lasso is called accepting when C5 
has an accepting state. A more generalized formalization of this idea is given in 
the following theorem. 


Theorem 2. Let B = (Q,q9,6,2^P, F) be a Büchi automaton for some set of 
atomic propositions AP = X UY UZ and let o' € (2Y)". The automaton B 
has infinitely many X U Y -projected models o with o =y o' if and only if B 
has an accepting doubly pumped lasso with lassos p and p' such that: (1) p is an 
accepting lasso (2) tr(p) =y tr(p') =y o' (3) The period of p' shares at least one 
state with p and (4) tr(p) 4x tr(p'). 
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To check whether there is a sequence o' € (2Y)" such that the number of XUY- 
projected models ø of B with o =y o’ is infinite, we search for a doubly pumped 
lasso satisfying the constraints given in Theorem 2. This can be done by applying 
the following procedure: 

Given a Büchi automaton B = (Q,qo,2^P,0, F) and sets X,Y,Z C AP, 


we construct the following product automaton B, = (d. x 
24P öx, Fx) where: Qx = Q x Q, axo = (%40), 6x = (51,52) SY 
(51,55) | s1 > 52,8, = sh,a =y a'} and Fy = Q x F. The automa- 


ton B has infinitely many models o’ if there is an accepting lasso p = 
(qo; qo) (0, 04) . . . ((45, 45) (05-315 O41) <- (dk; d.) (015 Qk+1)) in Bx such that: 
dh < j. qj, = qj, i.e., B has lassos pı and p» that share a state in the period of pı 
and dh > j. an #x a’, i.e., the lassos differ in the evaluation of X in a position 
after the shared state and thus allows infinitely many different sequence over X 
for the a sequence over Y. The lasso p simulates a doubly pumped lasso in B 


satisfying the constraints of Theorem 2. 


Theorem 3. Given an alternating Büchi automaton A = (Q,q9,0,2^P, F) for 
a set of atomic propositions AP = X UY UZ, the problem of checking whether 
there is a sequence a’ € (2Y )" such that A has infinitely many X U Y -projected 
models o with o =y o’ is PSPACE-complete. 


The lower and upper bound for the problem can be given by a reduction from 
and to the satisfiability problem of LTL [4]. Due to the finite structure of Büchi 
automata, if the number of models of the automaton exceed the exponential 
bound 2/9, where Q is the set of states, then the automaton has infinitely many 
models. 


Lemma 1. For any Büchi automaton B, the number of models of B is less or 
equal to 29! otherwise it is oo. 


Proof. Assume the number of models is larger than 2/9! then there are more 
than 2/9! accepting lassos in B. By the pigeonhole principle, two of them share 
the same 2!9l-prefix. Thus, either they are equal or we found doubly pumped 
lasso in B. 


Corollary 1. Let a Biichi automaton B over a set of atomic propositions AP 
and sets X,Y C AP. For each sequence oy € (2Y)? the number of XUY- 
projected models o with o =y oy is less or equal than 2!9! otherwise it is oo. 


From Corollary 1, we know that if no sequence cy € (2Y)" matches to infinitely 
many X U Y-projected models then the number of such models is bound by 
2/9l. Each of these models has a run in B which ends in an accepting strongly 
connected component. Also from Corollary 1, we know that every model has a 
lasso run of length |Q]. For each finite sequence wy of length |wy| = |Q] that 
reaches an accepting strongly connected component, we count the number XUY- 
projected words w of length |Q| with w =y wy and that end in an accepting 
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Algorithm 2 Maximum Model Counting 


Input: B = (Q,q0, 24”, 6, F), disjoint X,Y,Z C 
AP,ncN 
Output: #x,y,z(B) >n 
1: SCC = acceptingSCC(B) 
2: i—1 
3: W= 
SeScc 
4: while i < |Q| do 
5: 1—ic-l 
6 for q € W do 
7 for (q',a, q) do 
8 W' =W’'U {q} 
9: for o € II(q) do 
10: II' (q") = H (q') U {ayux : 0) 
11 
12 
13 
14 
15 


II(qi) := {o1,..., 0%} 


IT(q3) := {a101,... 
U (az01,... 


,Q10k} 


,020;] 


Ww=w’' 

W'=0 

for q € W do 
I'(q) =9 II(qo) :— (01....,05] 


: return maxx,y,z ll(qo) > n 


Fig. 4. Maximum Model Counting Algorithm (left) and a Sketch of a step in this 
algorithm (right): Current elements of our working set are q1,q2 € W and qs € W”. 
If i = 0, i.e., we are in the first step of the algorithm, then q and q2 are states of 
accepting SCCs. 


strongly connected component. This number is equal to the maximum model 
counting number. 

Algorithm 2 describes the procedure. An algorithm for the minimum model 
counting problem is defined in similar way. The algorithm works in a backwards 
fashion starting with states of accepting strongly connected components. In each 
iteration i, the algorithm maps each state of the automaton with XUY projected 
words of length i that reach an accepting strongly connected component. After 
|Q| iterations, the algorithm determines from the mapping of initial state qo a 
Y-projected word of length |Q| with the maximum number of matching X U Y- 
projected words (Fig. 4). 


Theorem 4. The decisional version of the maximum model counting problem 
over automata (MMCA), i.e. the question whether the maximum is greater than 
a given natural number n, is in NP**. 


Proof. Let a Büchi automaton over an alphabet 24” for a set of atomic proposi- 
tions AP and sets APx, APy, APz C AP and a natural number n be given. We 
construct a nondeterministic Turing Machine M with access to a #P-oracle as 
follows: M guesses a sequence oy € 24v. It then queries the oracle, to compute 
a number c, such that c = |[ox € (24Px)" | daz € (24Pz)". ox Uoy U øz € 
L(B)}|, which is a #P problem [27]. It remains to check whether n > c. If so, 
M accepts. 


'The following theorem summarizes the main findings of this section, which estab- 
lish, depending on the property, an exponentially or even doubly exponentially 
better algorithm (in the quantitative bound) over the existing model checking 
algorithm for HyperLTL. 
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Theorem 5. Given a Kripke structure K and a quantitative hyperproperty p 
with bound n, the problem whether K = can be decided in logarithmic space 
in the quantitative bound n and in polynomial space in the size of K. 


5 A Max +#Sat-Based Approach 


For existential HyPERLTL formulas p, and v», we give a more practical model 
checking approach by encoding the automaton-based construction presented in 
Sect. 4 into a propositional formula. 

Given a Kripke structure K = (S,s80,7, APx,L) and a quantitative hyper- 
property Y = V4, ..., Tk. Y, — (#0 : A. v) an over a set of atomic propositions 
AP, € APx and bound p, our algorithm constructs a propositional formula ó 
such that, every satisfying assignment of $ uniquely encodes a tuple of lassos 


(71,...,71,0) of length uin K, where (71,..., mk) satisfies Y, and (71,...,7%,0) 
satisfies v. To compute the values max Hoa | (Tir... Tko) FE LA^ vH 
Risk 
(in case <4 € {<,<}) or ; min Hoa | (05... 745,0) E V, ^ vM (in case 
Ty Th 


4€ {>,>}), we pass ¢ to a maximum model counter, respectively, to a minimum 
model counter with the appropriate sets of counting and maximization, respec- 
tively, minimization propositions. From Lemma 1 we know that it is enough 
to consider lasso of length exponential in the size of y. The size of $ is thus 
exponential in the size of y and polynomial in the size of K. 

The construction resembles the encoding of the bounded model checking 
approach for LTL [16]. Let v, = 3mj...m-,,. v; and Y = ary... mp». V" and 
let APy, and APy be the sets of atomic propositions that appear in 7, and 
w respectively. The propositional formula $ is given as a conjunction of the 
following propositional formulas: ¢ = A;z,[K]4, ^ [K]5 ^ [v] ^ [v], where: 


— p. is length of considered lassos and is equal to u = 2!^:^v"l | S]RTR TR +1 41 
which is one plus the size of the product automaton constructed from the 
k+ k' +k” +1 self-composition and the automaton for wv, ^. The “plus one” 
is to additionally check whether the number of models is infinite. 

— [K]5 is the encoding of the transition relation of the copy of K where 
atomic propositions are indexed with m and up to an unrolling of length 
k. Each state of K can be encoded as an evaluation of a vector of log |S] 
unique propositional variables. The encoding is given by the propositional 
formula I(U5) ^ Aca T(¥7, V71) which encodes all paths of K of length k. 
The formula [(v7) defines the assignment of the initial state. The formulas 
T(V7, V7,,) define valid transitions in K from the ith to the (i + 1)st state 
of a path. 

— [v]? and [v]? are constructed using the following rules?: 


3 We omitted the rules for boolean operators for the lack of space. 
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ick i= 
[oni al Vizo (ly ^ ai) 

[>ar]; na, Vico (ly ^ ad) 

[Oil CAE Vino (ly A leli) 
[vi 4 voli [Deol V (eal, ^ lox 4 ole) | Vico 5 ^ (14 ex) 
(Gill poyi leali V Teil, A (i i) false | 
[or R pali [Lez ^ Cox V [o eM) | Vico A (ox R92) 
(01 R ya) Leo] A edi, V 601 ex) true 


in case of an existential quantifier over a trace variable 7, we add a copy of 
the encoding of K with new variables distinguished by 7: 


[Breil | TAT: ^ Teil, 


We define sets X = {ai | a € A,i < k}, Y = (a^ | a € AP, V A, i € k} and 
Z = P\ XUY, where P is the set of all propositions in 6. The maximum model 
counting problem is then MMC(9, X, Y, Z). 


5.1 Experiments 


We have implemented the Max#Sat-based model checking approach from the 
last section. We compare the Max#Sat-based approach to the expansion- 
based approach using HyPERLTL [26]. Our implementation uses the MaxCount 
tool [29]. We use the option in MaxCount that enumerates, rather than approx- 
imates, the number of assignments for the counting variables. We furthermore 
instrumented the tool so that it terminates as soon as a sample is found that 
exceeds the given bound. If no sample is found after one hour, we report a 
timeout. 

Table 1 shows the results on a parameterized benchmark obtained from the 
implementation of an 8bit passcode checker. The parameter of the benchmark is 


Table 1. Comparison between the expansion-based approach (MCHyper) and the 
Max#Sat-based approach (MCQHyper). #max is the number of maximization vari- 
ables (set Y). #count is the number of the counting variables (set X). TO indicates a 
time-out after 1h. 


Benchmark Specification MCHyper MCQHyper 
#Latches | #Gates|/Time(sec)|#var |#max|#count|Time(sec) 
Pwd.8bit | 1bit_leak 9 55 0.3 97 16 2 1 
2bit_leak 0.4 176 32 4 1 
3bit_leak 1.3 336 64 8 2 
Abit_leak 97 656 | 128 16 4 
5bit_leak TO 1296 | 256 | 32 8 
6bit_leak TO 2576 | 512 | 64 335 
8bit. leak TO 10256 |2048 |256 TO 
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the bound on the number of bits that is leaked to an adversary, who might, for 
example, enter passcodes in a brute-force manner. In all instances, a violation is 
found. The results show that the Max#Sat-based approach scales significantly 
better than the expansion-based approach. 


6 Conclusion 


We have studied quantitative hyperproperties of the form Vm,...,7%.  — (#0 : 
A. i 4n), where o and v are HyperLTL formulas, and #0 : A.p <n compares 
the number of traces that differ in the atomic propositions A and satisfy 7 to 
a threshold n. Many quantitative information flow policies of practical inter- 
est, such as quantitative non-interference and deniability, belong to this class of 
properties. Our new counting-based model checking algorithm for quantitative 
hyperproperties performs at least exponentially better in both time and space 
in the bound n than a reduction to standard HyperLTL model checking. The 
new counting operator makes the specifications exponentially more concise in 
the bound, and our model checking algorithm solves the concise specifications 
efficiently. 

We also showed that the model checking problem for quantitative hyperprop- 
erties can be solved with a practical Max#SAT-based algorithm. The SAT-based 
approach outperforms the expansion-based approach significantly for this class 
of properties. An additional advantage of the new approach is that it can handle 
properties like deniability, which cannot be checked by MCHyper because of the 
quantifier alternation. 
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Abstract. Relational safety specifications describe multiple runs of the 
same program or relate the behaviors of multiple programs. Approaches 
to automatic relational verification often compose the programs and ana- 
lyze the result for safety, but a naively composed program can lead to 
difficult verification problems. We propose to exploit relational speci- 
fications for simplifying the generated verification subtasks. First, we 
maximize opportunities for synchronizing code fragments. Second, we 
compute symmetries in the specifications to reveal and avoid redundant 
subtasks. We have implemented these enhancements in a prototype for 
verifying k-safety properties on Java programs. Our evaluation confirms 
that our approach leads to a consistent performance speedup on a range 
of benchmarks. 


1 Introduction 


The verification of relational program specifications is of wide interest, having 
many applications. Relational specifications can describe multiple runs of the 
same program or relate the behaviors of multiple programs. An example of the 
former is the verification of security properties such as non-interference, where 
different executions of the same program are compared to check whether there 
is a leak of sensitive information. The latter is useful for checking equivalence or 
refinement relationships between programs after applying some transformations 
or during iterative development of different software versions. 

There is a rich history of work on the relational verification of programs. Rep- 
resentative efforts include those that target general analysis using relational pro- 
gram logics and frameworks [4,5,8,27,31] or specific applications such as security 
verification [1,7,9], compiler validation [16,32], and differential program analy- 
sis [17,19,21—23]. These efforts are supported by tools that range from automatic 
verifiers to interactive theorem-provers. In particular, many automatic verifiers 
are based on constructing a composition over the programs under consideration, 
where the relational property over multiple runs (of the same or different pro- 
grams) is translated into a functional property over a single run of a composed 
program. This has the benefit that standard techniques and tools for program 
verification can then be applied. 

However, it is also well known that a naively composed program can lead to 
difficult verification problems for automatic verifiers. For example, a sequential 
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composition of two loops would require effective techniques for generating loop 
invariants. In contrast, a parallel composition would provide potential for align- 
ing the loop bodies, where relational invariants may be easier to establish than 
a functional loop invariant. Examples of techniques that exploit opportunities 
for such alignment include use of type-based analysis with self-composition [29], 
allowing flexibility in composition to be a mix of sequential and parallel [6], 
exploiting structurally equivalent programs for compiler validation [32], lockstep 
execution of loops in reasoning using Cartesian Hoare Logic [27], and merging 
Horn clause rules for relational verification [13,24]. 

In this paper, we present a compositional framework that leverages rela- 
tional specifications to further simplify the generated verification tasks on the 
composed program. Our framework is motivated by two main strategies. The 
first strategy, similar to the efforts mentioned above, is to exploit opportunities 
for synchrony, i.e., aligning code fragments across which relational invariants 
are easy to derive, perhaps due to functional similarity or due to similar code 
structure, etc. Specifically, we choose to synchronize the programs at conditional 
blocks as well as at loops. Similar to closely related efforts [6,27], we would like 
to execute loops in lockstep so that relational invariants can be derived over 
corresponding iterations over the loop bodies. Specifically, we propose a novel 
technique that analyzes the relational specifications to infer, under reasonable 
assumptions, mazimal sets of loops that can be executed in lockstep. Synchro- 
nizing at conditional blocks in addition to loops enables simplification due to 
relational specifications and conditional guards that might result in infeasible 
or redundant subtasks. Pruning of such infeasible subtasks has been performed 
and noted as important in existing work [27], and synchronizing at conditional 
blocks allows us to prune eagerly. More importantly, aligning different programs 
at conditional statements sets up our next strategy. 

Our second strategy is the exploitation of symmetry in relational specifica- 
tions. Due to control flow divergences or non-lockstep executions of loops, even 
different copies of the same program may proceed along different code fragments. 
However, some of the resulting verification subtasks may be indistinguishable 
from each other due to underlying symmetries among related fragments. We 
analyze the relational specifications, expressed as formulas in first-order theories 
(e.g., linear integer arithmetic) with multi-index variables, to discover symme- 
tries and exploit them to prune away redundant subtasks. Prior works on use 
of symmetry in model checking [11,14,15,20] are typically based on symmet- 
ric states satisfying the same set of indexed atomic propositions, and do not 
consider symmetries among different indices in specifications. To the best of 
our knowledge, ours is the first work to extract such symmetries in relational 
specifications, and to use them for pruning redundant subtasks during rela- 
tional verification. For extracting these symmetries, we have lifted core ideas 
from symmetry-discovery and symmetry-breaking in SAT formulas [12] to richer 
formulas in first-order theories. 

The strategies we propose for exploiting synchrony and symmetry via rela- 
tional specifications are fairly general in that they can be employed in vari- 


166 L. Pick et al. 


if (y; > 20) { 
while (i; < 10) { 
xj *= ij; 


yı > 20^ y» > 20^ ys > 20 
yı > 20^ y» > 20^ ys € 20 


ijt*; yı > 20^ yo € 20^ ys > 20 
} yı > 20A yo € 20^ ys € 20 
} else { yi € 20^ y2 > 20 A ys > 20 
while (ij < 10) (1 
we yı € 20^ ys > 20^ ys < 20 
Xxjtt; 
ijt*; yı € 20^ y» < 20 ^ ys > 20 
} yı € 20^ y» € 20 ^ ys < 20 


} 


Fig. 1. Example program (left), and eight possible control-flow decisions (right). 


ous verification methods. We provide a generic logic-based description of these 
strategies at a high level (Sect.4), and also describe a specific instantiation 
in a verification algorithm based on forward analysis that computes strongest- 
postconditions (Sect. 5). We have implemented our approach in a prototype tool 
called SYNONYM built on top of the DESCARTES tool [27]. Our experimental 
evaluation (Sect. 6) shows the effectiveness of our approach in improving the per- 
formance of verification in many examples (and a marginal overhead in smaller 
examples). In particular, exploiting symmetry is crucial in enabling verification 
to complete for some properties, without which DESCARTES exceeds a timeout 
on all benchmark examples. 


2 Motivating Example 


Consider three C-like integer programs (Pj) of the form shown in Fig. 1 (left). 
They are identical modulo renaming, and we use indices j € {1,2,3} as sub- 
scripts to denote variables in the different copies. We assume that each variable 
initially takes a nondeterministic value in each program. 

A relational verification problem (RVP) is a tuple consisting of programs 
(Pj), a relational precondition pre, and a relational postcondition post. In the 
example RVPs below, we consider the three conditionals, which in turn lead to 
eight possible control-flow decisions (Fig. 1, right) in a composed program. Each 
RVP reduces to subproblems for proving that post can be derived from pre for 
each of these control-flow decisions. In the rest of the section, we demonstrate 
the underlying ideas behind our approach to solve these subproblems efficiently. 


Maximizing Lockstep Execution. Given an RVP (referred to as RVP1) with pre- 
condition x1 < z3 A z1 > 0A dq » 0A ig > dq A ty = ia (pre) and postcondition 
(zi < z3 V yi Z ys) ^ di > OA doa > di At, = da (post), consider a control-flow 
decision yı > 20 A yg > 20 ^ y3 > 20. This leads to another RVP, consisting of 
three programs of the following form: 


assume(y; > 20); while (ij < 10) [x; *= ij; i,++;} 
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where j € {1,2,3}, and the aforementioned pre and post. From pre, it follows 
that i4 = i3 and ig > i4. We can thus infer that the first and third loops 
are always executed the same number of times, while the second loop may be 
executed for fewer iterations. This knowledge lets us infer a single relational 
invariant for the first and third loops and handle the second loop separately. 
Clearly, the relational invariant zı < x3 Ai, = i3 ^ij < 10 and the non- 
relational invariant i9 < 10 are enough to derive post. If we were to handle the 
first and third loop separately, we would need complex nonlinear invariants such 
21.init X11! £3, init X i3! . . ET . 
as y; = SS and r3 = ==, which involve auxiliary variables £j init 
and ij init denoting the initial values of x; and i; respectively. 


41, init! 73, init! 


Symmetry-Breaking. For the same program, and an RVP (referred to as RVP2) 
with precondition i; > 0 ^49 > i4 Ai, = i3 and postcondition 7; > 0 ^49 > 
i4 ^4; = i3, consider a control-flow decision yı > 20 A ys > 20 A y3 < 20. We 
generate another RVP involving the following set of programs: 


assume(y; > 20); while (i, < 10) [xj *= ij; i,++;} 
assume(yo > 20); while (io < 10) {xg *= ig; iot+;} 


assume(y3 < 20); while (ia < 10) (xa**; ig++;} 


Similarly, decision yı < 20 ^ y» > 20 ^ ys > 20 generates yet another RVP over 
the following: 


assume(y; < 20); while (i, < 10) {xy++; i,++;} 
assume(yo > 20); while (ig < 10) {xg *= ig; iot+;} 


assume(y3 > 20); while (ig < 10) {x3 *= is; igt+;} 


Both RVPs have the same precondition and postcondition as RVP2. We can 
see that both RVPs differ only in their subscripts; by taking one and swapping 
the subscripts 1 and 3 due to symmetry, we arrive at the other. Thus, knowing 
the verification result for either RVP allows us to skip verifying the other one, 
by discovering and exploiting such symmetries. 


3 Background and Notation 


Given a loop-free program over input variables i! and output variables y (such 
that Z and y are disjoint), let Tr(Z, y) denote its symbolic encoding. 


Proposition 1. Given two loop-free programs, Tr\(%1,91) and Tro(#2,%2), a 
precondition pre(Z1,%2), and a postcondition post(ii, Y2), the task of relational 
verification is reduced to checking validity of the following formula. 


pre(z1, 32) ^ Tri(@1, yi) ^ Tro(€2, Y2) => post(yi. Y2) 


Given a program with one loop (i.e., a transition system) over input variables 
i! and output variables y, let Init(z,u) denote a symbolic encoding of the block 
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of code before the loop, Guard(t) denote the loop guard, and Tr(u, y) encode 
the loop body. Here, 4i is the vector of local variables that are live at the loop 
guard. For example, consider the program from our motivating example: 


assume(y; > 20); while (i; < 10) [xi *= ii; iji**;]) 


In its encoding, X = 4 = (i1,%1,y1), Y = (0,21), Init(z,u) = yi > 20, 
Guard(u) = i < 10, and Tr(u,y) = x, = zi x iq ^v, — ii 4 1. 


Proposition 2 (Naive parallel composition). Given two loopy programs, 
(Init (zi, ti), Guard(t1), Tr(i1,g1)) and (Init(Z2, uz), Guard (t2), Tr(ti2, ¥2)), a 
precondition pre(Z1,%2), and a postcondition post(¥, Y2), the task of relational 
verification is reduced to the task of finding (individual) inductive invariants Ia 
and Ig: 


Ia(i) ^ Guard2(ti2) ^ Tro(tia, Vo 
TA) ^ I2(¥2) ^ ^ Guard (4i) ^ ^ Guard»(g») = post (gi, Y2) 


Note that the method of naive composition requires handling of multiple 
invariants, which is known to be difficult. Furthermore, it might lose some impor- 
tant relational information specified in pre(z1, z2). One way to avoid this is to 
exploit the fact that loops could be executed in lockstep. 


Proposition 3 (Lockstep composition). Given two loopy programs, 
nit (zi, ti), Guard (ü), Tr(a, Y1)) and (Init(Z2, ü2), Guard (ti2), Tr (U2, Y2)), a 
precondition pre(Z1,Z2), and a postcondition post(Y¥1, Y2). Let both loops iter- 
ate exactly the same number of times, then the task of relational verification 
is reduced to the task of finding one (relational) inductive invariant I: 


pre(£1, T2) ^ Init(Z,, uy) ^ Init(x», ti2) => T(t, U2) 
I(t, viz) ^ Guard, (t1) ^ Tri(ti1, #1) A Guarda (tz) ^ Tro(ti2, Y2) => I (1, Y2) 
I(JiJ2) A Guard: (Y1) ^ >Guard2(y2) => post(a, Y2) 


In this paper, we do not focus on a specific method for deriving these invari- 
ants — a plethora of suitable methods have been proposed in the literature, and 
any of these could be used. 


4 Leveraging Relational Specifications 


In this section, we describe the main components of our compositional framework 
where we leverage relational specifications to simplify the verification subtasks. 
We first describe our novel algorithm for inferring maximal sets of loops that 
can be executed in lockstep (Sect. 4.1). Next, we describe our technique for han- 
dling conditionals (Sect. 4.2). While this is similar to other prior work, the main 
purpose here is to set the stage for our novel methods for exploiting symmetry 
(Sect. 4.3). 


Exploiting Synchrony and Symmetry in Relational Verification 169 


4.1 Synchronizing Loops 


Given a set of loopy programs, we would like to determine which ones can be 
executed in lockstep. As mentioned earlier, relational invariants over lockstep 
loops are often easier to derive than loop invariants over a single copy. 

Our algorithm CHECKLOCKSTEP takes as input a set of loopy programs 
{P,,...,P,} and outputs a set of mazimal classes of programs that can be 
executed in lockstep. The algorithm partitions its input set of programs and 
recursively calls CHECKLOCKSTEP on the partitions. 

First, CHECKLOCKSTEP infers a relational inductive invariant over the loop 


bodies, synthesizing I(ŭ4,..., Up) in the following: 
k 
pre(@i,...,%) ^ fA Init(Z;, ti) => I(t., tx) 
i=1 
k 
Tui, Bes , UK) ^ A Guard; (t;) ^ Tr i(ü;, ii) => T(yi, Pus | Ui.) 
i=1 


CHECKLOCKSTEP then poses the following query: 


k k 
m (qe. 2 UK) A V ^Guard(ii)) — A Guo) (1) 


i—1 i—1 


'The left-hand side of the implication holds whenever one of the loops has ter- 
minated (the relational invariant holds, and at least one of the loop conditions 
must be false), and the right-hand side holds only if all of the loops have termi- 
nated. If the formula is unsatisfiable, then the termination of one loop implies 
the termination of all loops, and all loops can be executed simultaneously [27]. 
In this case, the entire set of input programs is one maximal class, and the set 
containing the set of all input programs is returned. 

Otherwise, CHECKLOCKSTEP gets a satisfying assignment and partitions the 
input programs into a set Terminated and a set Unfinished. The Terminated 
set contains all programs P; whose guards Guard(i;) are false in the model 
for the formula, and the Unfinished set contains the remaining programs. The 
CHECKLOCKSTEP algorithm is then called recursively on both Terminated and 
Unfinished, with its final result being the union of the two sets returned by these 
recursive calls. 

The following theorem assumes that any relational invariant I(t,..., tx), 
generated externally and used by the algorithm, is stronger than any relational 
invariant Z(t,..., d; 1, ;41,..., Uk) that could be synthesized over the same 
set of k loops with the i” loop removed. 


Theorem 1. For any call to CHECKLOCKSTEP, it always partitions its set of 
input programs such that for all P; € Terminated and Pj € Unfinished, P; and 
Pj cannot be executed in lockstep. 
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Proof. Assume that CHECKLOCKSTEP has partitioned its set of programs into 
the Terminated and Unfinished sets. Let P; € Terminated, Pj € Unfinished be 
arbitrary programs. Based on how the partitioning is performed, we know that 
there is a model for Eq.1 such that Guard(u;) does not hold and Guard(u;) 
does. We can thus conclude that the following formula is satisfiable: 


-(raa. Tp) A Guard(üj) => ^Guard(i)) 


From the assumption on our invariant synthesizer, we conclude that the following 
is also satisfiable, indicating that P; and P; cannot be executed in lockstep: 


- (15, aj) ^-Guard(u;) => ^Guard(i;)) 


where I(ū;, ŭ;) is the relational invariant for P; and Pj that our invariant syn- 
thesizer infers. 


4.2 Synchronizing Conditionals 


Let two programs have forms if Q; then R; else S;, where i € {1,2} and R; 
and S; are arbitrary blocks of code and could possibly have loops. Let them be a 
part of some RVP, which reduces to applying Propositions 1, 2, or 3, depending 
on the content of each block of code, to four pairs of programs. As we have seen in 
previous sections, each of the four verification tasks could be expensive. In order 
to reduce the number of verification tasks where possible, we use the relational 
preconditions to filter out pairs of programs for which verification conclusions 
can be derived trivially. 

For k programs of the form if Q; then R; else S; for i € {1,...,k} and 
precondition pre(Z,,...,2,), we can simultaneously generate all possible com- 
binations of decisions by querying a solver for all truth assignments to the Q;s: 


k 
pre(di,..., Zk) ^ AU (2) 
i=1 


We can then use the result of this All-SAT query to generate sets of programs 
in subtasks. For each assignment j, where each Q; is assigned a Boolean value vj, 
the following set is generated: {assume (V1); U,,..., assume (Vy) ; Up} where 
for each i € (1,..., k}, if vj = true, then V; = Q; and U; = Ri, else V; = 7Q; 
and U; = S;. We need to apply our verification algorithm on only the resulting 
sets of programs. For example, in our above RVP, if Q; is equivalent to Q» in all 
solutions, then the RVP reduces to verification of just two pairs of programs: 


assume (Qı); R, and assume (Q2); Ry 


assume (—Q,); S, and assume (505); S2 
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Algorithm 1. Algorithm for constructing a graph to find symmetries. 
1: procedure MAKEGRAPH(F’) 
(V, E) — ((vf*, ..., vit}, Ø) where each vj has color(vj") = Id 
for d € CLAUSES(F) do (V, E) — MAKECOLOREDAST(d) U (V, E) 
for v € V with a; € vars(color(v)) do 
V e (VN {v}) U {RECOLOR(v, v[z; > z])) 
E — EU ((v,vi^)) 


ro 


9 10 11 


Fig. 2. Graph with vertex names (outside the vertices) and colors (inside the vertices). 


4.3 Discovering and Exploiting Symmetries 


Using the All-SAT query from Eq.2 allows us to prune trivial RVPs. However, 
as we have seen in Sect.2, some of the remaining RVPs could be regarded as 
equivalent due to symmetry. First, we discuss how to identify symmetries in 
formulas syntactically, and then we show how to use such symmetries. 


4.3.1 Identifying Symmetries in Formulas 
Formally, symmetries in formulas are defined as permutations. Note that any per- 
mutation 7 of set (1,..., k} can be lifted to be a permutation of set {71,...,Z}. 


Definition 1 (Symmetry). Let £1,...,£, be vectors of the same size over dis- 
joint sets of variables. Asymmetry 7 of a formula F(3,..., xy) is a permutation 
of set (; | Y X à € k} such that F(3i,..., Zk) — F(m(xX4),...,m(xi)). 


'The task of finding symmetries within a set of formulas can be performed 
syntactically by first canonicalizing the formulas, converting the formulas into 
a graph representation of their syntax, and then using a graph automorphism 
algorithm to find the symmetries of the graph. We demonstrate how this can be 
done for a formula y over Linear Integer Arithmetic with the following example. 

Let p = (xı € £2 ^za € z4) ^ (zx1 € zo V 23 < 24). Note that this formula 
is symmetric under a permutation of the subscripts that simultaneously swaps 
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1 with 3 and 2 with 4. Let ((z1, 21), (£2, 22), (£3, 23), (24, 24)} be the vectors of 
variables. We identify a vector by its subscript (e.g., we identify (a1, 21) by 1). 
Our algorithm starts with canonicalizing the formula: o = (£1 < £2 V z1 = 
x2) A (£3 < £4V za = z4)^(x1 < z2 V za < 24). It then constructs a colored graph 
for the canonicalized formula with the procedure in Algorithm 1. The algorithm 
initializes a graph by the set of k vertices v4, . . . aon with color Id (vertices 21— 
24 in Fig.2), where k is the number of identifiers. It then (Line 3) adds to the 
graph the union of the abstract syntax trees (AST) for the formula’s conjuncts, 
where each vertex has a color corresponding to the type of its AST node. If a 
parent vertex has a color of an ordering-sensitive operation or predicate, then 
the children should have colors that include a tag to indicate their ordering (e.g., 
vertices 9 and 10 in Fig.2 have colors with tags because their parent has color 
<, but vertices 11 and 12 do not have tags because their parent has color =). 
Next (Line 4), the algorithm performs an appropriate renaming of vertex colors 
so that each indexed variable name x; is replaced with a non-indexed version 
x, while simultaneously adding edges from each vertex with a renamed color to 
v4. The resulting graph for y is shown in Fig. 2. Finally, the algorithm applies 
a graph automorphism finder to get the following automorphism (in addition to 
the identity automorphism), which is shown here in a cyclic notation where (x y) 
means that x — y and y — z (vertices that map to themselves are omitted): 


(0 1)(3 5)(4 6)(7 8)(9 13)(10 14)(11 15)(12 16)(17 19)(18 20)(21 23)(22 24) 


We are only interested in permutations of the vectors, so we project out the 
relevant parts of the permutation (21 23)(22 24) and map them back to our 
vector identifiers to get the following permutation on the identifiers: 


7—1[103,254,3— 142) 


4.3.2 Exploiting Symmetries 
We now define the notion of symmetric RVPs and application of symmetry- 
breaking to generate a single representative per equivalence class of RVPs. 


Definition 2 (Symmetric RVPs). Two RVPs: (Ps,pre(zi,..., £x), 
post(Y¥i,-.-,Ye)) and (Ps’', pre(%1,...,%%), post(yi,..-,Ye)), where Ps = 
(P1... Pe}, and Ps’ = {Pi,..., Pi}, are called symmetric under a permu- 
tation T iff 


1. n is a symmetry of formula pre(31,..., xy) ^ post(¥i,---, Yk) 

2. for every P; € Ps and P; € Ps’, if n(i) = j, then P; and Pj have the same 
number of inputs and outputs and have logically equivalent encodings for the 
same set of input variables T; and output variables Y; 


As we have seen in Sect. 4.3.1, identification of symmetries could be made 
purely on the syntactic level of the relational preconditions and postconditions. 
For each detected symmetry, it remains to check equivalence between the corre- 
sponding programs' encodings, which can be formulated as an SMT problem. 


Exploiting Synchrony and Symmetry in Relational Verification 173 


To exploit symmetries, we propose a simple but intuitive approach. First, 
we identify the set of symmetries using pre ^ post. Then, we solve the AII-SAT 
query from Eq.2 and get a reduced set R of RVPs (i.e., one without all trivial 
problems). For each RVP; € R, we perform the relational verification only if no 
symmetric RVP; € R has already been verified. Thus, the most expensive part 
of the routine, checking equivalence of RVPs, is performed on demand and only 
on a subset of all possible pairs (RVP;, RVP;). 

Alternatively, in some cases (e.g., for parallelizing the algorithm) it might 
help to identify all symmetric RVPs prior to solving the All-SAT query from 
Eq. 2. From this set, we can generate symmetry-breaking predicates (SBPs) [12] 
and conjoin them to Eq. 2. Constrained with SBPs, this query will have fewer 
models, and will contain a single representative per equivalence class of RVPs. 
We describe how to construct SBPs in more detail in the next section. 


4.3.8 Generating Symmetry-Breaking Predicates (SBPs) 

SBPs have previously been applied in pruning the search space explored by SAT 
solvers. Traditionally, techniques construct SBPs based on symmetries in truth 
assignments to the literals in the formula, but SBP-construction can be adapted 
to be based on symmetries in truth assignments to conditionals, allowing us to 
break symmetries in our setting. 

We can construct an SBP by treating each condition the way a literal is 
treated in existing SBP constructions. In particular, we can construct the com- 
mon Lex-Leader SBP used for predicate logic [12], which in our case will force a 
solver to choose the lexicographically least representative per equivalence class 
for a particular ordering of the conditions. For the ordering of conditions where 
Q; € Q; iff i < j and a set of symmetries S over {1,...,k}, we can construct 
a Lex-Leader SBP SBP(S) = Areg PP(z) with the more efficient predicate 
chaining construction [2], where we have that 


PP(r) = Pmin(I) ^ Noi => Yprev(i,l) = li ^ Dnext(i,I) 
iel 

and that J is the support of m with the last condition for each cycle removed, 
min(I) is the minimal element of J, prev(i, I) is the maximal element of I still 
less than 7 or 0 if there is none, nezt(i, I) is the minimal element of J still greater 
than i or 0 if there is none, po = go = true, pj is a fresh predicate for i Æ 0, 
gi = Qr) => Qi fori Z0, and l; = Qi = > Qo. 

After constructing the SBP, we conjoin it to the AII-SAT query in Eq. 2. Our 
solver now generates sets of programs that, when combined with the relational 
precondition and postcondition, form a set of irredundant RVPs. 


Example. Let us consider how SBPs can be applied to RVP» from Sect.2 to 
avoid generating two of the eight RVPs we would otherwise generate. 

First, we see that our three programs are all copies the same program and 
are at the same program point, so they will have the same encoding. Next, we 
find the set of permutations S over {1,2,3} such that for each 7 € S, we have 
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that ii > OA tg > i, A iy = 13 iff ia) > ON ds(2) 2 ir(1) A ir(1) = 1«(3). In this 
case, we have that S is the set of permutations {{1 > 1,2 2,3 3}, (1.5 
3,2 — 2,3 — 3}}. Now, we construct a Lex-Leader SBP (using the predicate 
chaining construction described above): 


pi A (pi => ((yy > 20) => (y2 > 20))) 


where p; is a fresh predicate. Conjoining this SBP to Eq. 2, leads to the RVPs 
arising from the control-flow decisions yı > 20 A yo > 20 A y3 € 20 and yı > 
20 ^ y2 € 20 ^ y3 € 20 no longer being generated. 


5 Instantiation of Strategies in Forward Analysis 


We now describe an instantiation of our proposed strategies in a verification algo- 
rithm based on forward analysis using a strongest-postcondition computation. 
Other instantiations, e.g., on top of a Horn solver based on Property-Directed 
Reachability [24] are possible, but outside the scope of this work. 


1: procedure VERIFY(pre, Current, Ifs, Loops, post) 

2 while Current 4 2 do 

3 if PROCESSSTATEMENT(pre, P;, Ifs, Loops, post) = safe then return safe 
A: if Loops 4 Ø then HANDLELOoPS(pre, Loops, post) 

5 else if Ifs Z Ø then HANDLEIFS(pre, Ifs, Loops, post) 

6 else return unsafe 


Given an RVP in the form of a Hoare triple { Pre} P,||--+ ||P: {Post}, where || 
denotes parallel composition, the top-level VERIFY procedure takes as input the 
relational specification pre = Pre and post = Post, the set of input programs 
Current = {P,,..., Pk}, and empty sets Loops and Ifs. It uses a strongest- 
postcondition computation to compute the next Hoare triple at each step until 
it can conclude the validity of the original Hoare triple. 


Synchronization. Throughout verification, the algorithm maintains three dis- 
joint sets of programs: one for programs that are currently being processed 
(Current), one for programs that have been processed up until a loop (Loops), 
and one for programs that have been processed up until a conditional statement 
(Ifs). The algorithm processes statements in each program independently, with 
PROCESSSTATEMENT choosing an arbitrary interleaving of statements from the 
programs in Current. When the algorithm encounters the end of a program in 
its call to PROCESSSTATEMENT, it removes this program from the Current set. 
At this point, the algorithm returns safe if the current Hoare triple is proven 
valid. When a program has reached a point of control-flow divergence and is 
processed by PROCESSSTATEMENT, it is removed from Current and added to 
the appropriate set (Loops or Ifs). 
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Handling Loops. Once all programs are in the Loops or Ifs sets (i.e. Current = 
@), the algorithm handles the programs in the Loops set if it is nonempty. 
HANDLELOOPS behaves like CHECKLOCKSTEP but computes postconditions 
where possible; when a set of loops are able to be executed in lockstep, 
HANDLELOOPS computes their postconditions before placing the programs into 
the Terminated set. After all loops have been placed in the Terminated set and 
a new precondition pre’ has been computed, rather than returning Terminated, 
HANDLELOOPS invokes VERIFY(pre’, Terminated, Ifs, Ø, post). 


Handling Conditionals. When Current = Loops = @, VERIFY handles condi- 
tional statements. HANDLEIFS exploits symmetries by using the All SAT query 
with Lex-Leader SBPs as described in Sect. 4 and calls VERIFY on each generated 
verification problem. 


6 Implementation and Evaluation 


To evaluate the effectiveness of increased lockstep execution of loops and 
symmetry-breaking, we implemented our algorithm from Sect.5 on top of the 
DESCARTES tool for verifying k-safety properties, i.e., RVPs over k identical 
Java programs. We implemented two variants: SYN uses only synchrony (i.e., no 
symmetry is used), while SYNONYM uses both. All implementations (including 
DESCARTES) use the same guess-and-check invariant generator (the same origi- 
nally used by DESCARTES, but modified to generate more candidate invariants). 
In SYNONYM, we compute symmetries in preconditions and postconditions only 
when all program copies are the same. For our examples, it sufficed to compute 
symmetries simply by checking if each possible permutation leads to equivalent 
formulas!. We compare the performance of our prototype implementations to 
DESCARTES”. We use two metrics for comparison: the time taken and the num- 
ber of Hoare triples processed by the verification procedure. All experiments 
were conducted on a MacBook Pro, with a 2.7 GHz Intel Core i5 processor and 
8GB RAM. 


6.1 Stackoverflow Benchmarks 


The first set of benchmarks we consider are the Stackoverflow benchmarks orig- 
inally used to evaluate DESCARTES. These implement (correctly or incorrectly) 
the Java Comparator or Comparable interface, and check whether or not their 
compare functions satisfy the following properties: 


1 Our implementation includes the syntactic symmetry-finding algorithm from 
Sect. 4.3.1, though we do not use it for evaluation here due to its high overhead 
in using an external tool for finding graph automorphisms. 

? While there are several tools for relational verification (e.g. ROSETTE/UNBOUND [25], 
VERIMAPREL [13], Reve [17], MoCHi [17], SyMD1FF [22]), most of these do not 
handle Java programs, and to the best of our knowledge, none of these tools has 
support for k-safety verification for k greater than 2. 
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P1: Va, y.sgn(compare(x, y)) = —sgn(compare(y, x)) 
P2: Va, y, z.(compare(x,y) > 0 ^ compare(y, z) > 0) => compare(x,z) > 0 
P3: Va, y, z.(compare(x, y) = 0) => (sgn(compare(x, z)) = sgn(compare(y, z))) 


(One of the original 34 Stackoverflow examples is excluded from our evalua- 
tion here because of the inability of the invariant generator to produce a suitable 
invariant.) We compare the results of running SYN and SYNONYM vs. DESCARTES 
for each property in Table 1. (Expanded versions and plots of these results are 
available in an extended version of the paper [26].) 

Because property P1 contains a symmetry, we notice an improvement in 
terms of number of Hoare triples with the use of symmetry for this property; 
however, the overhead of computing symmetries leads to SYNONYM performing 
more slowly than SYN even for some examples that exhibit reduced Hoare triple 
counts. Property P1 is also the easiest to prove (all implementations can verify 
each example in under 0.3s), so the overheads contribute more significantly 
to the runtime. For examples on which our implementations do not perform 
as well as DESCARTES, we perform reasonably closely to DESCARTES. These 
examples are typically smaller, and again overheads play a larger role in our 
poorer performance. 


Table 1. Stackoverflow Benchmarks. Total times (in seconds) and Hoare triple counts 
(HTC) for Stackoverflow benchmarks, where for each property, the results for SYN and 
SYNONYM are divided into those for examples where they exhibit a factor of improve- 
ment over DESCARTES that is greater or equal to 1 (top) and those for which they do 
not (bottom). Improv reports the factor of improvement over DESCARTES, where the 
number of examples is given in parentheses. 


Prop | DESCARTES SYN SYNONYM 


Time HTC | Time | Improv HTC | Improv Time | Improv HTC | Improv 
PL 3.11 | 4422 | 1.91 | 1.39 (27) |2255 | 1.69 (27) |1.82 |1.32 (25) |2401 | 1.82 (32) 
0.57 | 0.789 (6) 752 | 0.809 (6) | 0.87 | 0.816 (8) 48 | 0.979 (1) 
P2 24.6 |13434| 7.83 | 2.62 (20) | 3285 | 3.081 (16) | 7.31 | 2.80 (19) | 3224 | 3.140 (16) 
4.98 | 0.823 (13) | 4638 | 0.714 (17) | 5.1 0.816 (14) | 4638 | 0.714 (17) 
P3 18.85 | 10938 | 5.22 | 2.92 (20) | 1565 | 4.36 (16) | 5.22 | 2.91 (19) | 1537 | 4.74 (16) 
6.18 | 0.584 (13) | 6600 | 0.623 (17) | 6.16 | 0.594 (14) | 6600 | 0.623 (17) 


6.2 Modified Stackoverflow Benchmarks 


The original Stackoverflow examples are fairly small, with all implementations 
taking under 6s to verify any example. To assess how we perform on larger 
examples, we modified several of the larger Stackoverflow comparator examples 
to be longer, take more arguments, and contain more control-flow decisions. 
The resulting functions take three arguments and pick the “largest” object’s id, 
where comparison among objects is performed based on the original Stackover- 
flow example code. (Ties are broken by choosing the least id.) We check whether 
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these pick functions satisfy the following properties that allow reordering input 
arguments: 


P13: Va, y, z.pick(x, y, z) = pick(y, x, z) 
P14: Va, y, z.pick(x, y, z) = pick(y,x,z) ^ pick(z,y,z) = pick(z, y, x) 


Note that P13 allows swapping the first two input arguments, while P14 
allows any permutation of inputs, a useful hyperproperty. 

The results from running property P13 are shown in Table2. We see here 
that for these larger examples, Hoare triple counts are more reliably correlated 
with the time taken to perform verification. SYN outperforms DESCARTES on 14 
of the 16 examples, and SYNONYM outperforms both DESCARTES and SYN on 
all 16 examples. 

The results from running property P14 are shown in Table 3. For this prop- 
erty, note thatDESCARTES is unable to verify any of the examples within a one- 
hour timeout. Meanwhile, SYN is able to verify 10 of the 16 examples without 
exceeding the timeout. Exploiting symmetries here exhibits an obvious improve- 
ment, with SYNONYM not only being able to verify the same examples as SYN, 
with consistently faster performance on the larger examples, but also being able 
to verify an additional example within an hour. 


Table 2. Verifying P13 for modified Stackoverflow examples. Times (in seconds) and 
Hoare triple counts (HTC). 


Example DESCARTES SYN SYNONYM 
Time | HTC |Time| HTC | Time | HTC 
ArrayInt-pick3-false-simple 1.71 2573 1 1355 0.64 | 682 
ArrayInt-pick3-false 1.55 2591 1.06 | 1439 0.8 724 
ArrayInt-pick3-true-simple 1.71 2573 1.03 | 1355 0.65 |682 
ArrayInt-pick3-true 1.55 2591 1.08 | 1439 0.81 | 724 
Chromosome-pick3-false-simple | 0.9 1115 0.9 |883 0.53 |446 
Chromosome-pick3-false 2.51 2891 2.94 | 3019 1.59 |1514 
Chromosome-pick3-true-simple | 0.9 1115 0.9 |883 0.53 |446 
Chromosome-pick3-true 2.51 2891 2.96 | 3019 1.59 |1514 


PokerHand-pick3-false-part1 5.87 5825 0.42 | 359 0.46 |359 
PokerHand-pick3-false-part2 9.74 10589 0.85 | 323 0.86 | 323 
PokerHand-pick3-false 16.91 | 16475 0.73 | 159 0.79 | 159 
PokerHand-pick3-true-part1 5.83 5825 3.98 | 3503 2.4 1756 
PokerHand-pick3-true-part2 9.8 10565 7.36 | 5933 4.53 | 2971 


PokerHand-pick3-true 17.25 | 16475 | 12.1 | 9293 7.34 | 4651 

Solution-pick3-false 76.4 99910 | 25.05 | 20645 | 20.42 | 10327 
Solution-pick3-true 64.5 99910 | 19.66 | 20645 | 15.21 | 10327 
Total 219.64 | 283914 | 82.02 | 74252 | 59.15 | 37605 


Improvement 1 1 2.68 | 3.8237 3.713 | 7.5499 
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Table 3. Verifying P14 for modified Stackoverflow examples. Times (in seconds) and 
Hoare triple counts (HTC). - indicates that no sufficient invariant could be inferred. 


Example DESCARTES | SYN SYNONYM 

Time | HTC | Time HTC | Time HTC 
ArrayInt-pick3-false-simple TO |TO | 4.12 1938 4.66 1734 
ArrayInt-pick3-false TO |TO_ | 4.92 2017 6.03 1500 
ArrayInt-pick3-true-simple TO |TO | 321.15 | 140593 | 170.43 | 58586 
ArrayInt-pick3-true TO | TO | 366.98 | 149125 | 240.25 |62141 
Chromosome-pick3-false-simple | TO | TO | 47.8 14097 | 1.67 834 
Chromosome-pick3-false TO |TO | 264.21 | 93052 | 4.91 3043 
Chromosome-pick3-true-simple | TO | TO | 299.51 | 79613 | 135.56 | 33179 
Chromosome-pick3-true TO |TO |TO TO 848.22 | 225044 


PokerHand-pick3-false-part1 TO |TO |0.57 391 0.73 391 
PokerHand-pick3-false-part2 TO |TO /|0.81 228 0.81 228 
PokerHand-pick3-false - - - - - - 
PokerHand-pick3-true-part1 TO |TO | 2277.03 | 819553 | 1272.58 | 341486 
PokerHand-pick3-true-part2 TO |TO |- - - - 


PokerHand-pick3-true - - - - - - 
Solution-pick3-false TO |TO |TO TO TO TO 
Solution-pick3-false TO |TO |TO TO TO TO 


Summary of Experimental Results. Our experiments indicate that our perfor- 
mance improvements are consistent: on all DESCARTES benchmarks (in Table 1, 
which are all small) our techniques either have low overhead or show some 
improvement despite the overhead; and on modified (bigger) programs they lead 
to significant improvements. In particular, we report (Table2) speedups up to 
21.4x (on an example where the property doesn't hold) and 4.2x (on an example 
where it does). More importantly, we report (Table3) that DESCARTES times 
out on 14 examples, where of these SYNONYM times out for 2 and cannot infer 
an invariant for one example. 


7 Related Work 


The work most closely related to ours is by Sousa and Dillig [27], which pro- 
posed Cartesian Hoare Logic (CHL) for proving k-safety properties and the tool 
DESCARTES for automated reasoning in CHL. In addition to the core program 
logic, CHL includes additional proof rules for loops, referred to as Cartesian 
Loop Logic (CLL). A generalization of CHL, called Quantitative Cartesian Hoare 
Logic was subsequently used by Chen et al. [10] to detect side-channel vulnera- 
bilities in cryptographic implementations. 

In terms of comparison, neither CHL nor CLL force alignment at conditional 
statements or take advantage of symmetries. We believe our algorithm for iden- 
tifying a maximal set of lockstep loops is also novel and can be used in other 
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methods that do not rely on CHL/CLL. On the other hand, CLL proof rules 
allow not only fully lockstep loops, but also partially lockstep loops. Although we 
did not consider it here, our maximal lockstep-loop detection algorithm can be 
combined with their partial lockstep execution to further improve the efficiency 
of verification. For example, applying the Fusion 2 rule from CLL to our exam- 
ple while loops generated from RVP (Sect. 2) would result in three subproblems 
and require reasoning twice about the second copy’s loop finishing later. When 
combined with maximal lockstep-loop detection, we could generate just two sub- 
problems: one where the first and third loops terminate first, and another where 
the second loop terminates first. 

Other automatic efforts for relational verification typically use some kind of 
product programs [6, 13,17,21,22,24,28], with a possible reduction to Horn solv- 
ing [13,17,21,24]. Similarly to our strategy for synchrony, most of them attempt 
to leverage similarity (structural or functional) in programs to ease verifica- 
tion. However, we have seen less focus on leveraging relational specifications 
themselves to simplify verification tasks, although this varies according to the 
verification method used. Some efforts do not reason over product programs 
at all, relying on techniques based on decomposition [3] or customized theories 
with theorem proving [4,30] instead. To the best of our knowledge, none of these 
efforts exploit symmetry in programs or in relational specifications. 

On the other hand, symmetry has been used very successfully in model check- 
ing parametric finite state systems [11,15,20] and concurrent programs [14]. Our 
work differs from these efforts in two main respects. First, the parametric sys- 
tems considered in these efforts have components that interact with each other 
or share variables. Second, the correctness specifications are also parametric, 
usually single-index or double-index properties in a propositional (temporal) 
logic. In contrast, in our RVPs, the individual programs are independent and 
do not share any common variables. The only interaction between them is via 
relational specifications. Furthermore, we discover symmetries in these relational 
specifications over multi-index variables, expressed as formulas in first-order the- 
ories (e.g., linear integer arithmetic). We then exploit these symmetries to prune 
redundant RVPs during verification. 

There are also some similarities between relational verification and verifica- 
tion of concurrent /parallel programs. In the latter, a typical verifier [18] would 
use visible operations (i.e., synchronization operations or communication on 
shared state) as synchronizing points in the composed program. In our work, 
this selection is made based on the structure of the component programs and 
the ease of utilizing or deriving relational assertions for the code fragments. 
Furthermore, one does not need to consider different orderings in interleavings 
of programs in the RVPs. Since these fragments are independent, it suffices to 
explore any one ordering. Instead, we exploit symmetries in the relational asser- 
tions to prune away redundant RVPs. 

Finally, specific applications may impose additional synchrony requirements 
pertaining to visibility. For example, one may want to check for information 
leaks from private inputs to public outputs not only at the end of a program 
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but at other specified intermediate points, or information leakage models for 
side-channel attacks may check for leaks based on given observer models [1]. 
Such requirements can be viewed as relational specifications at selected synchro- 
nizing points in the composed program. Again, we can leverage these relational 
specifications to simplify the resulting verification subproblems. 


8 Conclusions and Future Work 


We have proposed novel techniques for improving relational verification, which 
has several applications including security verification, program equivalence 
checking, and regression verification. Our two key ideas are maximizing the 
amount of code that can be synchronized and identifying symmetries in rela- 
tional specifications to avoid redundant subtasks. Our prototype implementation 
on top of the DESCARTES verification tool leads to consistent improvements on 
a range of benchmarks. In the future, we would be interested in implementing 
these ideas on top of a Horn-based relational verifier (e.g., [25]) and extending it 
to work with recursive data structures. We are also interested in developing an 
algorithm for finding symmetries in formulas that does not rely on an external 
graph automorphism tool. 
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Abstract. We present a bounded model checking tool for verifying Java 
bytecode, which is built on top of the CPROVER framework, named 
Java Bounded Model Checker (JBMC). JBMC processes Java bytecode 
together with a model of the standard Java libraries and checks a set 
of desired properties. Experimental results show that JBMC can cor- 
rectly verify a set of Java benchmarks from the literature and that it is 
competitive with two state-of-the-art Java verifiers. 


1 Introduction 


The Java Programming Language is a general-purpose, concurrent, strongly 
typed, object-oriented language [13]. Applications written in Java are compiled 
to the bytecode instruction set and binary format as defined in the Java Vir- 
tual Machine (JVM) specification. This compiled Java bytecode can run on all 
platforms on top of a JVM without the need for recompilation. However, Java 
programs may have bugs, which may result in array bound violations, unintended 
arithmetic overflows, and other kinds of functional and runtime errors. In addi- 
tion, Java allows multi-threading, and thus, problems such as race conditions 
and deadlocks can occur. 

To detect such issues, we developed an extension to the C Bounded Model 
Checker (CBMC) [6], named JBMC," that verifies Java bytecode. JBMC consists 
of a frontend for parsing Java bytecode and a Java operational model (JOM), 
which is an exact but verification-friendly model of the standard Java libraries. 
A distinct feature of JBMC, when compared with other approaches [2,7,9], is 
the use of Bounded Model Checking (BMC) [4] in combination with Boolean 
Satisfiability and Satisfiability Modulo Theories (SMT) [3] and full symbolic 
state-space exploration, which allows us to perform a bit-accurate verification 
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of Java programs. Apart from JBMC, there are other Java verifiers, which use 
different verification approaches. 


Existing Java Verifiers. JayHorn is a verifier for Java bytecode [9] that uses 
the Java optimization framework Soot [14] as a front-end and then produces a 
set of constrained Horn clauses to encode the verification condition (VC). Java 
Path Finder (JPF) is an explicit-state and symbolic software model checker for 
Java bytecode [2]. JPF is used to find and explain defects, collect runtime infor- 
mation as coverage metrics, deduce test vectors, and create corresponding test 
drivers for Java programs. JPF checks for property violations such as deadlocks 
or unhandled exceptions along all potential execution paths as well as user- 
specified assertions. ESC/Java is a compile-time extended static checker, which 
detects common programming errors (e.g., null dereference, array bounds errors, 
and type cast errors) [7]. It uses an automatic theorem prover to catch bugs that 
go beyond the abilities of the Java type checker, including runtime errors and 
synchronization errors in concurrent programs. 


2 JBMC: A Bounded Model Checker for Java Bytecode 


2.1 Architecture and Implementation 


Our front-end integrates a class loader, which accepts Java bytecode class files 
and jar archives (Fig. 1). The parse trees for the classes are translated into the 
CPROVER CFG representation, which is called a GOTO program [6]. 


Parse | Convert | | Symbolic 


classes | to GOTO | Execution Solver 


Fig. 1. JBMC verification process 


To handle polymorphism, JBMC encodes virtual method dispatch into a 
switch over the runtime type information attached to the object in order to select 
the correct method to be called. Similarly, the complex control flow arising from 
exceptions is encoded into conditional branches. We record the exception thrown 
in a global variable, which is then used to propagate the exception up the call 
stack until a matching catch statement (if any) to handle the error is reached. 
JBMC can detect when the JVM would abort due to an exception that is not 
caught within the program. 

The resulting GOTO program is then passed to the bounded model check- 
ing algorithm for finding bugs. The BMC algorithm symbolically executes the 
program, unwinding loops and unfolding recursive function calls up to a given 
bound. The resulting bit-vector formula is then passed on to the configured SAT 
or SMT solver [6]. 
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2.2 Java Operational Model 


The Java language relies on compiler-generated functions and classes as well 
as a large standard library. In order to correctly support Java functionality, 
we developed an abstract representation of the standard Java libraries, called 
the operational model (OM). The use of OMs is commonplace in analysers for 
Java; for instance, a similar approach was previously proposed for the formal 
verification of Android applications [12]. Currently, our OM consists of models 
of the most common classes from java.lang and a few from java.util. Our Java 
OM simplifies the implementation of the standard Java library by removing 
verification-irrelevant performance optimizations (e.g., in the implementation 
of container classes), exploiting declarative specifications (using assume) and 
functions that are built into the CPROVER framework (e.g., for array and string 
manipulation). We are continuously extending our OM to speed up verification 
by replacing the original standard Java library classes by our models. 

Java has an assert(c) statement for specifying safety properties. In addi- 
tion, we provide API classes that allow users to define non-deterministic verifi- 
cation harnesses and stub functions. The API contains such methods for primi- 
tive types (e.g., int nondetInt O) and generic methods (i.e., parametrised by a 
type T) as <T> T nondetWithNull(O and <T> T nondetWithoutNull() to non- 
deterministically initialize object references that may or may not be null. The 
API also provides an assume(c) method, which advises JBMC to ignore paths 
that do not satisfy a user-specified condition c. 

Currently, JBMC handles neither the Java Native Interface, which allows 
Java code to interface native libraries, nor reflection, which allows the program 
to inspect and manipulate itself at runtime. We are currently extending JBMC to 
support generics and lambdas; and to verify multi-threaded Java programs (that 
use java.lang. Thread), exploiting the partial order encoding technique of [1]. 


2.8 String Solver 


One of the biggest challenges in verifying Java programs is the widespread 
use of character strings, which makes verification problems resulting from 
Java programs highly complex. Solving such constraints is an active area of 
research [5,8,11]. JBMC implements a solver for strings to determine the sat- 
isfiability of a set of constraints involving string operations. Our string solver 
supports the most common basic accesses (e.g., obtain the length of a string 
and a character at a given position); comparisons (e.g., lexicographic compari- 
son and equality); transformations (e.g., insertion, concatenation, replacement, 
and removal); and conversions (e.g., conversion of the primitive data types into a 
string and parsing them from a string). The axioms for these operations use quan- 
tified constraints. For instance, a Java expression s.substring(5) is translated 
into a predicate substring(res, s, 5), where res, s are pairs (length, charArray), 
representing the resulting and the input string s, respectively; and substring 
is axiomatized by the formula Vi.(0 < i ^ i < s.length — 5) — (res.length = 
s.length — 5) ^ (res.charArray|i] = s.charArray[i + 5]). The universal quantifiers 
are handled using quantifier elimination [10]. 
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2.4 JBMC Usage 


Runtime errors in Java (e.g., illegal memory access) are detected by the JVM and 
an appropriate exception is thrown (e.g., NullPointerException, ArrayIndex- 
OutOfBoundsException). An AssertionError is thrown on violation of a con- 
dition specified by the programmer using the assert keyword. JBMC analyzes 
the program and verifies whether such error conditions occur. 

JBMC can be used to analyze a single class file:? jbmc C.class --unwind k 
or a Java archive (jar) file: jbnc file.jar --main-class class --unwind k. In 
both cases the entry point for the analysis of the program is the static void main 
method of the specified main class. k is a positive integer limiting the number of 
times loops are unwound and recursions are unfolded. If no bug is found, up to a 
k-depth unwinding, then JBMC reports VERIFICATION SUCCESSFUL; otherwise, 
it reports VERIFICATION FAILED along with a counterexample in the form of an 
execution trace (--trace), which contains the full variable assignment in each 
program state with file, method, and line information. Note that if the Java byte- 
code is compiled with debug information, then JBMC can also provide the original 
program variable names in the counterexample, rather than just bytecode variable 
slots. Further JBMC options can be retrieved via jbmc --help. 


(a) JBMC suite (b) Recursive suite 
Correctness of tools on benchmarks suite 'jbmc'. Correctness of tools on benchmarks suite 'recursive'. 
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Fig. 2. Verification results for JayHorn, JBMC and JPF 


? If a class C is in a package x.y, then compile it to some-dir/x/y/C.class, and in 
some-dir execute jbmc-installation-dir/ jbmc x/y/C.class --unwind k. 
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Fig. 3. Runtime comparison of JBMC to JayHorn and JPF 


3 Experimental Evaluation 


There is no standard benchmark suite for Java verification. Therefore, we took 
our entire regression test suite consisting of 177 benchmarks (including known 
bugs and hard benchmarks that JBMC cannot yet handle); these benchmarks 
(denoted as “jbmc”) test common Java features (e.g., polymorphism, excep- 
tions, arrays, and strings). We also used 23 recursive benchmarks (denoted as 
“recursive” ) taken from the JayHorn repository [9], and 64 minepump bench- 
marks (denoted as “minepump”) from the SV-COMP repository. Additionally, 
we have extracted 104 benchmarks from the JPF regression test suite [2]. The 
following table summarizes the characteristics of the benchmark sets: 


Benchmark set | Total | Safe | Unsafe | Avg. LOC 
jbmc 177 89 | 88 25 
jpf 104 52 | 52 52 
recursive 23 14 9 35 
minepump 64 8 | 56 62 
total 368 |163 | 205 40 


3.1 Objectives and Setup 


Our experiments aim at answering two research questions: [RQ1] (correctness) 
How accurate is JBMC when verifying the chosen benchmarks? [RQ2] (per- 
formance) How does JBMC performance compare to other existing verifiers? 
To answer both questions, we analyze all benchmarks with three Java verifiers 


3 Benchmarks and detailed results are available at https:/ /www.cprover.org/jbmc. 


188 L. Cordeiro et al. 


(JBMC v5.8-cav18, JayHorn v0.5.1, and JPF v32) on an Intel Core i7-6700 CPU 
8x3. 40 GHz, with 32 GB of RAM, running Ubuntu 16.04 LTS. We restrict CPU 
time and memory to 300s and 15 GB, respectively. JBMC uses a stepwise app- 
roach to unwinding loops (to prove unbounded safety) and runs with MiniSat2 
as its SAT backend. 


3.2 Results 


Figure 2 gives an overview of the experimental results for the four benchmark 
suites. Correct safe means that the program was analyzed to be free of errors, 
correct unsafe means that the error in the program was found, incorrect safe 
means that the program had an error but the verifier did not find it, incorrect 
unsafe means that an error is reported for a program that fulfills the specifica- 
tion, timeout indicates that the verifier has exceeded the time limit, and error 
represents an internal failure in the verifier or exhaustion of available memory. 
The following table summarizes the overall results: 


Correct Incorrect 

Total | Safe | Unsafe | Total | Safe | Unsafe | Timeout | Error 
JayHorn|189 | 52 | 137 97 5 92 67 15 
JBMC 1327 | 138 |189 14 5 9 21 
JPF 277 |158 |119 80 77 3 3 8 


The experimental results show that JBMC reached a successful verification 
rate of approximately 89% while JayHorn reported 51% and JPF 75%, which 
positively answers RQ1. JayHorn and JPF currently produce 6 times more incor- 
rect results (i.e., bugs in the tool) than JBMC. To answer RQ2, Fig. 3 compares 
the analysis times for the benchmarks where the tools return correct results. 
None of the three tools is consistently better than the other two. JBMC is faster 
than JPF on 176 benchmarks, JPF is faster than JBMC on 93. JBMC is faster 
than JayHorn on 222 benchmarks, whereas JayHorn is faster than JBMC on 25. 
In comparison to JayHorn, JBMC deals poorly with recursion, as its analysis led 
to timeout for 69% of the recursive benchmarks, whereas JayHorn could only 
solve a single benchmark from the minepump benchmark suite. In summary, we 
observed that JBMC’s scalability depends mainly on the complexity of string 
operations, loops, recursion and (floating-point) arithmetic. 


4 Conclusions and Future Work 


Despite more than 15 years of research in BMC and Java verification, JBMC 
is the first BMC-based Java verifier. To achieve this, we based our implemen- 
tation on an industrial-strength verification framework, and developed a Java 
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OM, removing verification-irrelevant optimizations and exploiting declarative 
specifications and built-in functions. Because of the prevalent use of character 
strings in Java programs, we have also developed a string solver using an efficient 
quantifier elimination scheme. We compare JBMC to JayHorn and JPF, which 
are state-of-the-art verifiers for Java bytecode based on constrained Horn clauses 
and path-based symbolic execution, respectively. Experimental results show that 
JBMC achieves a successful verification rate of 89% compared to 51% of Jay- 
Horn and 75% of JPF. For future work, the Java OM will be extended to support 
more Java classes, with the goal of speeding up verification of larger Java appli- 
cations. In addition, we are currently extending JBMC to verify multi-threaded 
programs. 


Acknowledgments. We thank P. Riimmer and W. Visser for helpful discussions 
about JayHorn and JPF, respectively. 
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Abstract. We introduce a method of abstraction from infinite-state to 
finite-state model checking based on eager theory explication and evalu- 
ate the method in a collection of case studies. 


1 Introduction 


In constructing decision procedures for arithmetic formulas and other theories, 
a successful approach has been to separate propositional reasoning and theory 
reasoning in a modular way. This approach is usually called Satisfiability Mod- 
ulo Theories, or SMT [1]. There are two primary approaches to SMT: eager and 
lazy theory explication. Both approaches abstract the formula in question by con- 
structing its propositional skeleton, that is, converting each atomic predicate to 
a corresponding free Boolean variable. Obviously, propositional abstraction loses 
a great deal of information. The eager approach compensates for this by con- 
joining tautologies of the theory to the formula before propositional abstraction. 
In abstract interpretation terms, we can think of this as a semantic reduction: 
it makes the formula more explicit without changing its semantics. The lazy 
approach, on the other hand, performs the propositional abstraction first, then 
retroactively adds tautologies of the theory to rule out infeasible propositional 
models. 

In this paper, we will consider applying the same concepts to the symbolic 
model checking problem (SMC). In this problem, we are given a Kripke model 
M that is expressed implicitly using logical formulas, and a temporal formula 6$, 
and we wish to determine whether M |= ¢. The states of the Kripke model are 
structures of a logic L over a given vocabulary, while the set of initial states J and 
the set of transitions T are expressed, respectively, by one- and two-vocabulary 
formulas. The atomic propositions in o are also presumed to be expressed in L. 

In the case where L is propositional logic, the Kripke model is finite-state, 
the SMC problem is PSPACE-complete, and many well-developed techniques 
are available to solve it in a heuristically efficient way. On the other hand, if 
L is a richer logic (say, Presburger arithmetic) SMC is usually undecidable. 
Here, we propose to solve instances of this problem by separating propositional 
reasoning and theory reasoning in a modular way, as in SMT. Given an SMC 
problem (1,T,$), we will form its propositional abstraction by computing the 
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propositional skeletons of J, T and ¢. This abstraction is sound, and allows us to 
apply well-developed tools for propositional SMC, however it loses a great deal 
of information. To compensate for this loss, we will use incomplete eager theory 
explication. By controlling theory explication, the user controls the abstraction. 
We will call this general approach eager symbolic model checking, or ESMC. 


Related Work. Because of the propositional abstraction, ESMC may at first 
seem to be a form of predicate abstraction [9]. This is not the case, however. 
Predicate abstraction uses a vocabulary of predicates to abstract the state, but 
does not abstract the theory itself. As a result, a decision procedure for the 
theory is needed to compute the best abstract transformer. This is problematic 
if the logic is undecidable, and in any event requires an exponential number of 
decision procedure calls in the worst case. In ESMC, the abstraction is performed 
in a purely syntactic way. One controls the abstraction by giving a set of axiom 
schemata to be instantiated and by introducing prophecy variables, as opposed 
to giving abstraction predicates. One effect of this is that the abstraction may 
depend on the precise syntactic expression of the transition relation. 

The technique of “datatype reductions” [18] is also closely related. This 
method has been used to verify various parameterized protocols and microar- 
chitectures using finite-state model checking [5,6,12,19,20]. The technique also 
abstracts an infinite-state SMC problem to a finite-state one syntactically. 
Though it does not do this by explicating the theory, we will see that the abstrac- 
tion it produces can be simulated by ESMC. Compared to this method, ESMC 
is user-extensible and allows both a simpler theoretical account and a simpler 
implementation. Moreover, it uses a smaller trusted computing base, since the 
tautologies it introduces can be mechanically checked. 

The methods of Invisible Invariants [25] and Indexed Predicate Abstrac- 
tion [14] use different methods to compute the least fixed point in a finite abstract 
domain of quantified formulas. This requires decidability and incurs a relatively 
high cost for computing an extremal fixed point, limiting scalability (though IPA 
can approximate the best transformer in the undecidable case). The abstractions 
are also difficult to refine in practice. 


Road Map. After preliminaries in the next section, we introduce our schema- 
based class of abstractions in Sect. 3. The next section gives some useful instanti- 
ations of this class. Section 5 describes a methodology for exploiting the abstrac- 
tion in proofs of infinite-state systems, as implemented in the IVy tool. In Sect. 5, 
we evaluate the approach using case studies. 


2 Preliminaries 


Let FO_(S, X) be standard sorted first-order logic with equality, where S is a 
collection of first-order sorts and X is a vocabulary of sorted non-logical symbols. 
We assume a special sort B € S that is the sort of propositions. Each symbol 
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f? € X has an associated sort S of the form D, x---x D, — R, where Dj, RES 
and n > 0 is the arity of the symbol. If n = 0, we say f? is a constant, and 
if R = B it is a relation. We write vocab(t) for the set of non-logical symbols 
occurring in term t. 

Given a set of sorts S, a universe U maps each sort in S to a non-empty 
set (with U (B) = {T, L}). An interpretation of a vocabulary X over universe U 
maps each symbol fPi* *D»—F in X to a function in U(D1) x --- x U(D4) ^ 
U(R). A X-structure is a pair M = (U,Z) where U is a universe and Z is 
an interpretation of X over U. The structure is a model of a proposition ¢ in 
FOZ(S, X) if ¢ evaluates to T under Z according to the standard semantics of 
first-order logic. In this case, we write M E- $. Given an interpretation J with 
domain disjoint from Z, we write M, 7 to abbreviate the structure (U, Z U J). 

In the sequel, we take the vocabulary X to be a disjoint union of four sets: 
Xs, the state symbols, Xg the primed symbols, Xr the temporary symbols, and 
Xp, the background symbols. We take (-)' to be a bijection Xs — X$ and extend 
it in the expected way to terms and interpretations. We write unprime(t) for the 
term u such that u’ = t, if u exists. 

A transition system is a pair (I, T) where I is a proposition over Xs U Xp 
and T is a proposition over X. Let Mp = (U,Zg) be a Xp-structure (that is, 
fix the universe and the interpretation of the background symbols). A U-state 
of the system is an interpretation of Xs (the state symbols) over U. A M p-run 
of the system is an infinite sequence so, $1,... of U-states such that: 


— Mp,so = I, and 
— for all 0 € i, there exists and interpretation Zr of XT over U such that 
MB, Si, Tr, S41 E T. 


That is, under the background interpretation, the initial state must satisfy the 
initial condition, and for every successive pair of states, there must be an inter- 
pretation of the temporary symbols such that the transition condition is satis- 
fied. The temporary symbols are used, for example, to model local variables of 
procedures, and may also be Skolem symbols. Because they can have second- 
order sort, we cannot existentially quantify them within the logic, so instead we 
quantify them implicitly in the transition system semantics. Given a background 
theory 7 over Xp, a T-run is any M g-run such that Mp E= 7. 

A linear temporal formula over X applies the operators of FO. (S, X) plus 
the standard strict until operator U and strict since operator S. We define (9 = 
LU Q, Od = à ^ -(TU ^9) and also Ho = SL, meaning “always ¢ in the strict 
past”. We fix 7 and say (I, T) E- $ if every T-run of (I, T) satisfies ó under 
the standard LTL semantics. The symbolic model checking problem SMC is to 
determine whether (7, T) E- 9. 


3 A Schema-Based Abstraction Class 


An atom is a proposition in which every instance of (^, V, 2,4, S} occurs under a 
quantifier. The propositional skeleton of a proposition $ is obtained by replacing 
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each atom in $ by a corresponding propositional constant. The propositional 
skeleton is an abstraction, in the sense that for every model M of ¢ we can 
construct a model of its propositional skeleton from the truth values of each 
atomic proposition in M. We will use propositional skeletons here to convert an 
infinite-state model checking problem to a finite-state one. 

We assume that each vocabulary Jp, Xs and XT contains a countably infi- 
nite set of propositional constants. This allows us to construct injections Ap, 
As, Ar from atomic propositions of the logic to propositional constants in Xp, 
dig and Xr respectively. 

In defining the propositional skeleton of a transition formula we must con- 
sider atomic propositions containing symbols from more than one vocabulary. To 
which vocabulary should we map such an atom in the propositional skeleton? 
Here, we take a simple solution that is sound, though it may lose some state 
information. That is, for any atomic proposition ¢, we say 


— if vocab(¢) C Xp, then .A(9) = Ap(9), 

— else if vocab(9) C Eg U Xs then A(¢) = .As(9) 

— else if vocab(¢) C Xg U X; then .A(9) = Ags(unprime(¢))’ 
- else A(9) = Ar(¢) 


That is, pure background propositions are abstracted to background symbols, 
state propositions are abstracted to state symbols and next-state propositions are 
abstracted to the primed version of the corresponding state proposition. Every- 
thing else is abstracted to a temporary symbol (which is existentially quantified 
in the abstract transition relation). 

We then extend A to non-atomic formulas in the obvious way, such that 
Alo ^w) = Alp) ^ AY), A(O¢) = O.A(9) and so on. The following theorem 
shows that we can use propositional skeletons to convert infinite-state to finite- 
state model checking problems in a sound (but incomplete) way: 


Theorem 1. For any symbolic transition system (1, T) and linear temporal for- 


mula $, if (A(T), A(T)) = .A(9) then (I, T) = 9. 


Intuitively, this holds because we can convert every concrete counterexample to 
an abstract one by simply extracting the truth values of the atomic propositions. 


Theory Explication. While propositional skeletons are sound, they lose a 
great deal of information. For example, suppose our transition relation is y’ = 
x. Given a predicate p, we would like to infer that p(x) = (Op(y). However, 
in the propositional skeleton, the transition relation A(T) is just Ar(y’ = x). 
In other words, it is just a free propositional symbol with no relation to any 
other proposition. Thus, we cannot prove the abstracted property A(p(x)) > 
OA(()). 

To mitigate this loss of information, we use theory explication. That is, before 
abstracting T, we conjoin to it tautologies of the logic or the background theory. 
This doesn’t change the semantics of T', and thus the set of runs of the transition 
system remains unchanged. It does, however, change the propositional skeleton. 
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For example, y = x A p(x) = p(y’) is a tautology of the theory of equality. 
If we conjoin this formula to T in the above example, the abstract transition 
relation becomes .Ar(y' = x) ^ (Ar(y' = x) ^ As(p(z)) = As(p(y))) which is 
strong enough to prove the abstracted property. 

In general, theory explication adds predicates to the abstraction. This is 
the only mechanism we will use to add predicates; we will not supply them 
manually, or obtain them automatically from counterexamples. The following 
theorem justifies model checking with eager theory explication: 


Theorem 2. For any symbolic transition system (1, T), linear temporal for- 
mula à, XMgU Xs formula yr and X formula wr, if T = wr Avr then 
(I ^wvr, T A tr) = o iff (I, T) EP. 


The question, of course, is how to choose the tautologies in v»; and wr. This is 
not just a question of capturing the transition relation semantics, since theory 
explication also determines the FO predicates representing state of the finite 
abstraction. Thus, complete theory explication is at least as hard as predicate 
discovery in predicate abstraction. Our goal is not to solve this problem, but to 
find an effective incomplete strategy that is useful in practice. It is important 
that the resulting finite-state model checking problems be easily resolved by a 
modern model checker, and that in case the strategy fails, a human can use the 
resulting counterexample and effectively refine the abstraction. 


Schema-Based Theory Explication. The basic approach we will use to con- 
trolling theory explication is a restricted case of the pattern-based quantifier 
instantiation method introduced in the Simplify prover [8]. That is, we are given 
a set of axioms, and for each axiom a set of triggers. A trigger is a term (or 
terms) containing all of the free variables in the axiom. The trigger is matched 
against all ground subterms in the formula being explicated. Each match induces 
an instance of the axiom. 

In our example above, suppose we have the axiom Y = X A p(X) => p(Y) 
with a trigger Y — X (here and in the sequel, capital letters will stand for free 
variables). The trigger Y = X matches the ground term y' = x in T which 
generates the ground instance y' = x A p(x) => p(y’). Since we match modulo 
the symmetry of equality, we also get x = y’ ^ p(y’) > p(x). 

A risk of trigger-based instantiation is the matching loop. For example, if 
we have the axiom f(X) > X +1 with a trigger f(X), then we can generate 
an infinite sequence of instantiations: f(y) > y +1, f(f(y)) > fly) +1 and so 
on. À simple approach to prevent this is to bound the number of generations 
of matching. In practice, we will use just one generation and expand the set 
axioms in cases where more than one generation is needed. This has the benefit 
of keeping the number of generated terms small, which limits the size of the 
SMC problem and also makes it easier for users to understand counterexamples. 

To avoid having to write a large number of axioms, we specify the axioms 
using general schemata. A schema is a parameterized axiom. It takes a list of 
sorts and symbols as parameters and yields an axiom. In the sequel we will use s 
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and t to stand for sort parameters. As an example, here is a general congruence 
schema that can be used in place of our axiom above: 


f:st 
XzYSGJ0 240) zY] 


The trigger is in curly braces. We first instantiate the axiom schemata for all 
possible parameter valuations using the sorts and symbols of the concrete system. 
'Then we ground the resulting axioms using pattern-based instantiation. 

One further technique is needed, however, to ground the quantifiers occur- 
ring in the formula being explicated. Quantifiers usually occur in the transition 
relations of parameterized systems either in the guards of guarded commands 
or in state updates. As an example, suppose a given command sets the state of 
process p to ‘ready’. This would appear in the transition formula as a constraint 
such as the following: 


Va. state'(z) = ready if x = p else state(z) 


If this quantifier is not instantiated, then all information about process state will 
be lost. To avoid this, we would like to apply the following schema: 


Y: sS, pis— 
(VX. p(X)) = ply) {VX. p(X)} 


Here we intend that p should match any predicate with one free variable and not 
just a predicate symbol (including non-temporal sub-formulas of the property 
to be proved). However, rather than implement a general second-order matching 
scheme, it is simpler to build this particular schema into the theory explication 
process. There is some question as to which ground terms to supply for the 
parameter y. As with other schemata, only constants are used in the current 
implementation. This appears to be adequate, but it might also be useful to 
allow the user to supply explicit triggers for quantifiers in the transition system 
or property. 
The theory explication process thus has three steps: 


1. Instantiate quantifiers in the formulas using the quantifier schema above. 

2. Generate axioms from the user axiom schemata, supplying symbols from the 
formulas as parameters. 

3. Instantiate the axioms using triggers for one generation. 


Notice this is a slight departure from the policy of one generation of matching, 
since terms generated in step 1 can be used to match axioms in step 3. This is 
important in practice since without grounding the quantifiers there may be no 
ground terms to match in step 3. 


4 Example Abstractions in the Class 


A typical approach to verifying parameterized protocols with finite-state model 
checking is to track the state of a representative fixed collection of processes 
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and abstract away the state of the remaining processes. In this approach, intro- 
duced in [17], a small collection background constants (typically two or three) 
is used to identify the tracked processes. For each process identifier in the sys- 
tem, the abstraction records whether it is equal to each of the tracked ids, but 
carries no further information. For each function f over process ids, the abstrac- 
tion maintains the value of f(a) only if x is equal to one of the background 
constants. This approach has been used, for example, to verify processor micro- 
architectures [12,16,17] and cache coherence protocols [5,6,19]. 

This abstraction can be implemented using schema-based instantiation. The 
high-level idea is to create a set of schemata that make it possible to abstractly 
evaluate terms in a bottom-up manner. 

For example, consider an occurrence t = u of the equality operator where t 
and u are terms of sort s. The abstract value of this term is T if t and u are 
both equal to some background constant c, L if t = c and u Æ c, and otherwise 
is unknown. To implement this abstraction, we use the following schemata: 


C:8 CiS 


X-cAY-eoX-Y(X-Y]) X=cAY4cSXFY {X=Y} 


The triggers of these two schemata cause them to be applied to every occurrence 
of an equality operator in the formula being abstracted. 

For an application f(t) of a function symbol, the abstract value is the abstrac- 
tion of f(c) if t is equal to background constant c, and is otherwise unknown. 
This fact could be captured by chaining the congruence schema above with 
the above two equality schemata. That is, matching the congruence schema, we 
obtain t = c => f(t) = f(c). Then matching the equality operator schemata with 
this result, we obtain (in the contrapositive) f(t) = f(c) ^ f(c) =d > f(t) 2d 
and f(t) = f(c) ^ f(c) Z d => f(t) Z d (for any background constants c, d). 
Recall, however, that we allow only one generation of matching, so this second 
matching step will not occur. Instead, we write the above two facts explicitly as 
a schema: 

c:8, d:t, f:s—t 

X=e> hk) =4e fl = UC 
This schema is matched for every application of a symbol of arity one in the 
formula. We also specify similar schemata for arities greater than one. Notice that 
this schema also applies to relation symbols if we treat T and L as background 
constants of sort B. However, for relations and functions to finitely enumerated 
sorts, it is more efficient to use the congruence schema, since it produces fewer 
instances. 

Finally, we need one additional schema to guarantee that the abstract values 
are consistent with the equality relation on the background constants: 


c: 8, d:s 
X=c=>(X =dSc=d) {x} 


Notice that this axiom is instantiated for every term in the formula (though 
in practice not for propositions). Though it doesn’t affect satisfiability of for- 
mulas, it is also helpful to add reflexivity, symmetry and transitivity over the 
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background constants as it makes the resulting counterexamples easier to under- 
stand. 

These schemata produce an abstraction of the formula that is at least as 
strong as the datatype reduction for scalarset types described in [18]. In fact, this 
is true if we restrict the application of the schemata to constants c and d in the 
set of background constants, which we do in practice. The cost of the abstraction 
is moderate, since the number of axiom instances is directly proportional to the 
size of the formula and to the number of background constants. 

An advantage of the schema-based explication approach is that we can use 
it to construct abstractions for various datatypes and even use different abstrac- 
tions of the same datatype for different applications. As an example, consider 
an abstraction for totally ordered datatypes such as the integers. We want the 
abstraction to track, for any term t of this sort, whether it is equal to, less 
than or greater than each background constant. The abstract value of a term 
t is captured by the values of the predicates t < c and t = c for background 
constants c. We begin with the abstract semantics of equality given above. The 
abstract semantics of the < relation can be given by the following schemata 
(where t € c is an abbreviation for t < c V t = c): 


c:s G:s 
X<cAc<YSxX<Y{X<Y} X<cAc<YSX<Y{X<Y} 


cis 

Y <cAW(X <c) S7(X <Y){X «Yl 
By chaining the congruence schema with these, we can obtain the abstract 
semantics of function application, but again we wish to limit the number of 
matching generations to one. Thus, as with equality, we write an explicit schema 
combining the two steps: 


c:8, d:t, f:s—t 
X —c- (f(X) <d f(c) « d) {f(X)} 


We also require that the abstract value of every term be consistent with the 
interpretation of — and « over the background constants. This gives us: 


cis c:s, d:t 
AX=cAX<c){X} X<dAA(X <c)ScK<d {x} 


With the equality schemata, these imply that the background constants are 
totally ordered. As an extension, if the totally ordered sort has a least element 0, 
we can add it as a background constant along with the axiom ^(X < 0). 

This abstraction is a bit weaker than the “ordset” abstraction used, for exam- 
ple, in [20]. We can simulate that abstraction by adding schemata that interpret 
the + operator, and facts about numeric constants such as 0 < 1. In general, for 
a given datatype, we can tailor an abstraction that captures just the properties 
of that type needed to prove a given system property. This extensibility makes 
the schema-based approach more flexible and possibly more efficient than the 
built-in abstractions of [18]. The above schemata have been verified by Z3. 
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5 Proof Methodology 


In the previous sections, we developed an approach to produce a sound finite- 
state abstraction of an infinite-state system using eager theory explication and 
propositional skeletons. Now we consider how to construct proofs of systems 
using this approach. This section is essentially a summary of some results in [18]. 

The first question that arises is how to obtain the set of background constants 
that determine the abstraction. Generally speaking these arise as prophecy vari- 
ables. For example, suppose we wish to prove a mutual exclusion property of the 
form LlVz, y. p(x) ^ p(y) = x = y. To do this, we replace the bound variables 
x and y with fresh background constants a and b, to obtain the quantifier-free 
property Op(a) ^ p(b) = a = b. In effect a and b are immutable prophecy vari- 
ables that predict the values of x and y for which the property will fail. By 
introducing prophecy variables, we refine the abstraction so that it tracks the 
state of the pair of processes that ostensibly cause the mutual exclusion property 
to fail. We hope, of course, to prove that there are no such processes. We apply 
the following theorem to introduce prophecy variables soundly: 


Theorem 3. Let (I, T) be a symbolic transition system, x:s a variable, d(x) a 
temporal formula and v:s a background symbol not occurring in I,T,o. Then 


(4, T) = Ova. O(a) iff (I, T) = Og(v). 


'This theorem can be applied as many times as needed to eliminate universal 
quantifiers from an invariance property. Further refinement can be obtained if 
needed by manually adding prophesy variables. For example, suppose that each 
process x has a ticket number t(x), and we wish to track the ticket number held 
by process a at the time of the failure. To do this, we replace our property with 
the property O c = t(a) = (pla) ^ p(b) = a = b) where c is a fresh background 
constant. In general, we can introduce additional prophecy variables using this 
theorem: 


Theorem 4. Let (I, T) be a transition system, ó a temporal formula and t a 
term. Then (I,T) = O¢ iff (I, T) H Ovr. x = t => $, where x is not free in 9. 


'The theorem can be applied repeatedly to introduce as many prophecy variables 
as needed to refine the abstraction. The introduced quantifiers can be converted 
to background symbols by the preceding theorem. 

Since our abstraction tracks the state of only processes a and 6, a protocol 
step in which an untracked process sends a message to a or b is likely to produce 
an incorrect result in the abstraction. To mitigate this problem, we assume by 
induction over time that our universally quantified invariant property $ has 
always held in the strict past. This makes use of the following theorem: 


Theorem 5. Let (I, T) be a symbolic transition system, and à a temporal for- 
mula. Then (I, T) E- O¢ iff (I, T) = O (Hd) => ¢. 


The quantifiers in ¢ will be instantiated with ground terms in T. Thus, in our 
mutual exclusion example, we can rely on the fact that the sender of a past 
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message (identified by some temporary symbol) is not in its critical section if 
either a or b are. Using induction in this way can mitigate the loss of information 
in the finite abstraction. Note we can pull quantifiers out of the above implica- 
tion in order to apply Theorem3. That is, (HVz. ¢) = Vz.$ is equivalent to 
Va. (HYx. à) => 9. 

If the above tactics fail to prove an invariant property because the abstraction 
loses too much information, we can strengthen the invariant by adding conjuncts 
to it. These conjuncts have been called “non-interference lemmas”, since they 
serve to reduce the interference with the tracked processes that is caused by loss 
of information about the untracked processes. We use the following theorem: 


Theorem 6. Let (I, T) be a symbolic transition system, and ¢,w temporal for- 
mulas. Then if (I, T) | O¢ ^ v then (I, T) =| Ue. 


'The general proof approach has the following steps: 


Strengthen the invariant property (manually) with Theorem 6. 

Apply temporal induction with Theorem 5. 

Add quantifiers to the invariant with Theorem 4. 

Convert the invariant quantifiers to background symbols with Theorem 3. 
Add tautologies to the system using Theorem 2 and specified schemata. 
Abstract to a finite-state SMC problem using Theorem 1. 

Apply a finite-state symbolic model checker to check the property. 


TD) ote mr 


Implementation in IVy. This approach has been implemented in the IVy 
tool [15]. In IVy, the state of the model is expressed in terms of mutable functions 
and relations over primitive sorts. The language is procedural, and allows the 
expression of protocol models as interleavings of atomic guarded commands, the 
semantics of which is expressible in first-order logic. 

To implement the approach, IVy's language was augmented with a syntax for 
expressing schemata. The schemata of Sect. 4 were added to the tool's standard 
library. Syntax is also provided to decorate invariant assertions with terms to be 
used as prophecy variables. IVy extends the above theory slightly by allowing 
invariant properties to be asserted not only between commands, but also in the 
middle of sequential commands. This can be convenient, since it allows invariants 
to reference local variables inside the commands. 

With this input, the tool applies the six transformation steps detailed above 
to produce a purely propositional SMC problem. This problem is then converted 
to the AIGER format [2], a standard for hardware model checking. At present, 
the system only handles safety properties of the form O(H¢) => ¢, where ¢ is 
non-temporal. The AIGER format does support liveness, however, and this is 
planned as a future extension. 

The resulting AIGER file is passed to the tool ABC [4] which uses its imple- 
mentation of property driven reachability [10] to check the property. The coun- 
terexample, if any, is converted back to a run of the abstract transition system. 
The propositional symbols in this run are converted back to the corresponding 
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atoms by inverting the abstraction mapping A. This yields an abstract coun- 
terexample: a sequence of predicate valuations that correspond to both the state 
and temporary symbols in the abstraction. 

The abstract counterexample may be spurious in the sense that it corresponds 
to no run of the concrete transition system. In this case, the user must analyze 
the trace to determine where necessary information was lost and either modify 
the invariant or refine the abstraction by adding a prophecy variable. 


6 Case Studies 


In this section, we consider the proof of safety properties of four parameterized 
algorithms and protocols. We wish to address three main questions. First, is 
the abstraction approach efficient? That is, if we construct an abstract model 
using schema-based theory explication, can the resulting finite-state problem 
be solved using a modern symbolic model checker? Second, is the methodology 
usable? That is, can a human user construct a proof using the methodology 
by analyzing the abstract counterexamples? Third, when is it more effective 
than the current best alternative, which is to write an inductive invariant man- 
ually and check it using an SMT solver, as in [11]? We will call this approach 
“invariant checking". We note that predicate abstraction is not suitable to these 
examples because the invariants require complex quantified formulas while cur- 
rent methods that synthesize quantified invariants for parameterized systems are 
unreliable in practice and do not scale well. 

The last question in particular has not been well addressed in prior work on 
model checking approaches to parameterized verification. In most cases, either no 
comparison was made, or comparison was made to proofs using general-purpose 
proof assistants, which tend to be extremely laborious and do not make use 
of current state-of-the art proof automation techniques. To make a reasonably 
direct comparison, we construct proofs of each model using both methodologies, 
using the same language and tool, using the state-of-the art tools ABC [4] for 
model checking and Z3 [7] for invariant checking. 

'To apply the invariant checking method, some of the protocol models have 
been slightly re-encoded. In particular, it is helpful in some cases to use relations 
rather than functions in modeling the protocol state, as this can prevent the 
prover from diverging in a *matching loop" [8]. This re-encoding adds negligibly 
to the proof effort and is arguably harmless, since it does not appear in practice 
to affect the difficulty of refining the model to a concrete implementation. 

Our four example models are: 


1. Tomasulo: a parameterized model of Tomasulo's algorithm for out-or-order 
instruction execution, taken from [17]. 

2. German: a model of a simple directory-based cache coherence protocol 
from [6]. 

3. FLASH: a model of a more complex and realistic cache coherence protocol 
from [19,23], based on the Stanford FLASH multiprocessor [13]. 

4. VS-Paxos: a model of Virtually Synchronous Paxos [3], a distributed con- 
sensus algorithm, from [21]. 
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Table 1. Comparison of proofs using two methodologies. 


Model Size | Model checking Invariant checking 

|Inv| | HVars | PVars | |Pf| | Time | |Inv| HVars ||Pf| | Time 
Tomasulo | 1245 | 100 |6 Tl 248|0.39 | 318,5 398 | 2.4 
German 754| 23 |1 29/0.60 | 234 1 240 | 1.8 
FLASH |2427| 81 |3 2 122 | 69 1235 1 1255 | 9.1 
VS-Paxos | 1442 | 224 |8 34 512 | 23 1022 2 1101 | 59 


A comparison of the proofs obtained using the two methodologies is shown in 
Table 1. The column “size” shows the textual size of the model plus property in 
lexical tokens. The columns labeled |Inv| give the size of the auxiliary invariants 
used in the proofs, expressed in the number of lexical tokens not including the 
property to be proved. Since both methods require the user to supply auxiliary 
invariants and discovering this invariant is the largest part of the effort in both 
cases, this number provides a fairly direct comparison of the complexity of the 
proofs. In both methodologies, the user also defines history or “ghost” variables 
that help in expressing the invariant. The number of these variables is shown 
in the columns labeled HVars. In the model checking approach, the user also 
refines the abstraction by defining prophecy variables. These were not used in 
the invariant checking proofs. The closest analogy in invariant checking proofs to 
this type of information would be quantifier instantiations or triggers provided 
by the user. This was not needed, however, since the methodology of [22] was 
applied to ensure that all verification conditions reside in a decidable fragment 
of the logic. For the model checking methodology, the number of distinct terms 
supplied by the user as prophecy variables is shown in the column labeled PVars. 
The time columns show the total time in seconds for model checking or invariant 
checking for the completed proofs on a 2.6 GHz Intel Xeon CPU using one core. 
Times to produce counterexamples were generally faster. 

When measuring the overall complexity of the proofs, it is unclear how to 
weight the three kinds of information supplied by the user. In a sense, prophecy 
variables are the easiest to handle, since their behavior is monotone. That is, 
adding a prophecy variable only increases precision so it cannot cause passing 
invariants to fail. Ghost variables are more conceptually difficult to introduce, 
since the invariants depend on them. If a ghost variable definition is changed to 
repair a failing invariant, this may cause a different invariant to fail. Similarly if 
we strengthen a passing invariant, it may fail to be proved and if we weaken a 
failing one it may cause other formerly passing invariants to fail. This instability 
can cause the manual proof search to fail to converge and is the chief cause of 
conceptual difficulty in constructing proofs in both methodologies. Having said 
this, for lack of a principled way to weight the different aspects of the proof effort, 
we will measure the proof size as simply the sum of the number of lexical tokens 
in the auxiliary invariant, the history variable definitions, and all terms used as 
prophecy variables. The total proof size is shown in the columns labeled |Pf]. 
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These numbers should be taken as unreliable for several reasons that are 
common to any attempt to measure the effectiveness of a proof methodology. 
First, the size of the proof (or any other measure of the proof difficulty, such 
as expended time) can depend on the proficiency of the user in the particular 
methodology. Even if the same user produces both proofs, the user’s proficiency 
in the two methodologies may differ, and knowledge gained in the first proof 
will effect the second one. Since resources were not available to train and test a 
statistically significant population users in both methodologies (assuming such 
could be found) the numbers presented here should not be considered a direct 
comparison of the methods. Rather, they are presented to support some obser- 
vations made below about the specific case studies and proofs. 


Case Study: Tomasulo’s Algorithm. This is a simple abstract model of a 
processor microarchitecture that executes instructions concurrently out of order. 
The model state consists of a register file, a set of reservation stations (RS) and 
a set of execution units (EU) and is parameterized on the size of each of these, 
as well as the data word size. The machine’s instructions are register-to-register 
and are modeled abstractly by an uninterpreted function. Each register has a 
flag that records whether it is the destination of a pending instruction. If so, its 
tag indicates which RS is holding that instruction. Each RS stores the tags of 
its instruction arguments, and waits for these to be computed before issuing the 
instruction to an EU. 

Both proofs are based on history variables that record the correct values of 
arguments and result for each RS. The principal invariant of both states that 
the arguments obtained by all RS's are correct. In the model checking case, the 
abstraction is refined by making the tags of these arguments and chosen EU into 
prophecy variables. T'his allows the model checker to track enough state infor- 
mation to prove the main invariant, though one additional “non-interference” 
lemma is needed to guarantee that other EU's do not interfere by producing an 
incorrect tag. Àn interesting aspect of the invariant is that it does not refer to 
the states of the register file or EU's. The necessary invariants of these structures 
can be inferred by the model checker. On the other hand, this information must 
be supplied explicitly in the manual invariant. As the table shows, the resulting 
invariant is more complex. 


Case Study: German's Cache Protocol. This simple distributed directory- 
based cache coherence protocol allows the caches to communicate directly only 
with the directory. The property proved is coherence, in effect that exclusive 
copies are exclusive. In the model checking proof, there is one non-interference 
lemma, stating that no cache produces a spurious invalidation acknowledgment 
message. No extra prophecy variables are need, as tracking the state of just the 
two caches that produce the coherence failure suffices. The manual invariant on 
the other hand is much more detailed, in fact about an order of magnitude larger. 
This is because it must relate the state of all the various types of messages in 
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the network to the cache and directory states. These relationships were inferred 
automatically by the model checker, resulting in a much simpler proof. 


Case Study: FLASH Cache Coherence Protocol. This is a much 
more complex (and realistic) distributed cache coherence protocol model. The 
increased protocol complexity derives from the fact that information can be 
transferred directly from one cache to another. In a typical transaction, a cache 
sends a request to the directory for (say) an exclusive copy of a cache line. The 
directory forwards the request to the current owner of the line, which then sends 
a copy to the original requester, as well as a response to the directory confirming 
the ownership transfer. Handling various race conditions in this scheme makes 
both the protocol and its proof complex. Again the property proved is coherence. 
The model checking proof is similar to [19], though there data correctness and 
liveness were proved. 

In this case, three non-interference lemmas are used in the model checking 
proof, ruling out three types of spurious messages. Also two additional prophecy 
variables are needed. For example, one of these identifies the cache that sent 
an exclusive copy. This allows the abstraction to track the state of the third 
participant in the triangular transaction described above. Generally, protocols 
with more complex communication patterns require more prophecy variables to 
refine the abstraction. 

As with German’s protocol, and for the same reason, the manual invariant 
is an order of magnitude larger. In this case, the additional protocol complexity 
makes it quite challenging to converge to an invariant and a large number of 
strengthenings and weakenings were needed. 


Case Study: Virtually Synchronous Paxos. This is a high-level model of 
a distributed consensus protocol, designed to allow a collection of processes to 
agree on a sequence of decisions, despite process and network failures. This model 
was previous proved by a manual invariant to be consistent, meaning that two 
decisions for a given index never disagree [21]. 

The protocol operates in a sequence of epochs, each of which has a leader 
process. The leader proposes decision values and any proposal that receives votes 
of a majority of processes becomes a decision. When the leader fails the protocol 
must move on to a new epoch. For consistency, any decisions that are possibly 
made in the old epoch must be preserved in the new. This is accomplished by 
choosing a majority of processes to start the new epoch and preserving all of 
their votes. Any decision having a majority of votes in the old epoch must have 
one voter in the new epoch’s starting majority and thus must be preserved. The 
choice of an epoch’s starting majority is itself a single-decree consensus problem. 
This is solved in a sequence of rounds called “stakes”. A stake can be created by 
a majority of processes and proposes the votes of some majority to be carried to 
the next epoch. Each process in the stake promises not accept any lesser stake 
with differing votes. If a majority accepts the stake, then the votes of that stake 
can be passed to the next epoch. 
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The important auxiliary invariants of the model checking proof are these: 


— At each epoch, the votes of the majority that ends the epoch are known to 
the leaders of all future epochs, and 

— When a stake is created, every lesser stake with different votes is “dead” in 
the sense that a majority of nodes has promised not to accept it, and 

— In any epoch, any two accepted stakes agree on their votes. 


Perhaps not surprisingly, the manual invariant is much larger. The model check- 
ing proof, however, requires many extra prophecy variables. This is mainly 
accounted for by the fact that the model has seven unbounded sorts: process 
id’s, decision indices, decision values, epochs, stakes, vote sets and process sets. 
Typically each invariant (including the one to be proved) requires one or two 
prophecy variables of each sort to refine the abstraction (though some of these 
may not be unique). 

An additional complication is dealing with sets and majorities. Sets of pro- 
cesses are represented by an abstract data type. This type provides a predicate 
called ‘majority’ that indicates that a set contains more than half of the pro- 
cess id’s. A function ‘common’ returns a common element between two sets if 
both are majorities (and is otherwise undefined). For example, to prove that 
we cannot have two conflicting decisions, we use the majorities that voted for 
each decision and declare the common process between these majorities as a 
prophecy variable. It then suffices to show that this particular process cannot 
have voted for both decisions (which requires the auxiliary invariants above). 
Since majorities are used in several places in the protocol, this tactic is applied 
several times. 

Because of the larger number of prophecy variables, our (admittedly arbi- 
trary) measure of overall proof complexity does not show as much advantage 
for model checking in this protocol as it does for the cache protocols. In fact, 
getting the details right in this proof was much more difficult subjectively than 
for FLASH. 

This difficulty may be related to the two sorts in the model that are totally 
ordered: epochs and stakes. For these sorts we use the schemata for totally 
ordered sets detailed in Sect. 4. The ordering of these sorts introduces some dif- 
ficulty in the proof, requiring more detailed invariants. For example, suppose we 
want to show that the first invariant above holds at the moment when a given 
process leaves one epoch and enters the next. The votes received at the epoch 
depend on all the previous epochs. We cannot however, make all of the unbound- 
edly many lesser epochs concrete by adding a finite number of prophecy variables. 
'This means our property must be inductive over epochs, that is, it holds now if 
it held in the past at the start of some particular epoch we can identify (perhaps 
the previous one). The need to write invariants that are inductive over ordered 
datatypes may account for the fact that the VS-Paxos invariant is more complex 
than that of the more complex FLASH protocol. 
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Discussion. We can make several general observations about these case studies. 
First, the performance of the finite-state model checker was never problematic. 
It always produced results in a reasonable amount of time and was not the bot- 
tleneck in constructing any of the proofs. Rather the most time-consuming task 
was usually analyzing the abstract counterexamples. This task proved tractable 
in practice, allowing the proof search process to converge. 

Second, the invariants used in the model checking approach are generally 
much smaller than the manual ones because of the model checker’s ability to 
infer state invariants. 

This advantage may be somewhat offset by the need to provide prophecy 
variables to refine the abstraction, especially in the case where there are many 
unbounded sorts. Moreover, the need to write properties that are inductive over 
ordered sorts may lessen the advantage of model checking in invariant complexity. 
This was evident in the case of VS-Paxos and to some extent in Tomasulo as 
well, because of the implicit induction over the instruction stream. These criteria 
may be helpful in deciding which approach to take to a given proof problem. 

Finally, it is interesting to note that the schemata presented in Sect. 4 proved 
adequate in all cases. That is, in no case was it necessary to add a schema to 
refine the abstraction of the transition relation. This indicates there is no need 
in practice to restrict to decidable logics or pay the cost of computing best 
transformers. 


7 Conclusion 


We have presented a method of abstracting parameterized or infinite-state SMC 
problems to finite-state problems based on propositional skeletons and eager 
theory explication. The method is extensible in the sense that users can add 
abstractions (or refine existing abstractions) by providing axiom schemata. It 
generalizes the ‘datatype reduction’ approach of [18] while giving both a sim- 
pler theoretical account and allowing a simpler implementation. Compared to 
predicate abstraction, it has the advantage that it can be applied to undecidable 
logics and does not require a costly decision procedure in the loop. The app- 
roach has been implemented in the [Vy tool. Based on some case studies, we 
found that the approach is practical and requires substantially less complex aux- 
iliary invariants than inductive invariant checking. We identified some conditions 
under which the approach is likely to be most effective. 

Conceivably some of the tasks performed here by a human could be auto- 
mated. However, the resulting system would be liable to fail unpredictably 
and opaquely. The present approach is an attempt to create a usable trade-off 
between human input and reliability. 

The next step is to implement liveness. Recent work has constructed liveness 
proofs in IVy by an infinite-state liveness-to-safety reduction, but the proofs are 
complex [21]. It would interesting to compare this to an approach that leverages 
a finite-state model checker's ability to prove liveness. 
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Abstract. We show how to leverage reinforcement learning (RL) in 
order to speed up static program analysis. The key insight is to estab- 
lish a correspondence between concepts in RL and those in analysis: a 
state in RL maps to an abstract program state in analysis, an action 
maps to an abstract transformer, and at every state, we have a set of 
sound transformers (actions) that represent different trade-offs between 
precision and performance. At each iteration, the agent (analysis) uses a 
policy learned offline by RL to decide on the transformer which minimizes 
loss of precision at fixpoint while improving analysis performance. Our 
approach leverages the idea of online decomposition (applicable to pop- 
ular numerical abstract domains) to define a space of new approximate 
transformers with varying degrees of precision and performance. Using a 
suitably designed set of features that capture key properties of abstract 
program states and available actions, we then apply Q-learning with lin- 
ear function approximation to compute an optimized context-sensitive 
policy that chooses transformers during analysis. We implemented our 
approach for the notoriously expensive Polyhedra domain and evaluated 
it on a set of Linux device drivers that are expensive to analyze. The 
results show that our approach can yield massive speedups of up to two 
orders of magnitude while maintaining precision at fixpoint. 


1 Introduction 


Static analyzers that scale to real-world programs yet maintain high precision are 
difficult to design. Recent approaches to attacking this problem have focused on 
two complementary methods. On one hand is work that designs clever algorithms 
that exploits the special structure of particular abstract domains to speed up 
analysis [5,10,15,16,20,21]. These works tackle specific types of analyses but the 
gains in performance can be substantial. On the other hand are approaches that 
introduce creative mechanisms to trade off precision loss for gains in speed [9,12, 
18,19]. While promising, these methods typically do not take into account the 
particular abstract states arising during analysis which determine the precision 
of abstract transformers (e.g., join), resulting in suboptimal analysis precision 
or performance. A key challenge then is coming up with effective and general 
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approaches that can decide where and how to lose precision during analysis for 
best tradeoff between performance and precision. 


Our Work. We address the above challenge by offering a new approach for 
dynamically losing precision based on reinforcement learning (RL) [24]. The key 
idea is to learn a policy that determines when and how the analyzer should 
lose the least precision at an abstract state to achieve best performance gains. 
Towards that, we establish a correspondence between concepts in static analysis 
and RL, which demonstrates that RL is a viable approach for handling choices 
in the inner workings of a static analyzer. 

To illustrate the basic idea, imagine that a static analyzer has at each pro- 
gram state two available abstract transformers: the precise but slow T, and the 
fast but less precise T7. Ideally, the analyzer would decide adaptively at each 
step on the best choice that maximizes speed while producing a final result of 
sufficient precision. Such a policy is difficult to craft by hand and hence we 
propose to leverage RL to discover the policy automatically. 

To explain the connection with RL intuitively, we think of abstract states 
and transformers as analogous to states of a Go board and moves made by 
the Go player, respectively. In Go, the goal is to learn a policy that at each 
state decides on the next player action (transformer to use) which maximizes 
the chances of eventually winning the game (obtaining a precise fixpoint while 
improving performance in our case). Note that the reward to be maximized 
in Go is long-term and not an immediate gain in position, which is similar to 
iterative static analysis. To learn the policy with RL, one typically extracts a 
set of features ¢ from a given state and action, and uses those features to define 
a so-called Q-function, which is then learned, determining the desired policy. 

In the example above, a learned policy would determine at each step whether 
to choose action Tp or T. To do that, for a given state and action, the analyzer 
computes the value of the Q-function using the features $. Querying the Q- 
function returns the suggested action from that state. Eventually, such a policy 
would ideally lead to a fixpoint of sufficient precision but be computed quicker. 

While the overall connection between static analysis and reinforcement learn- 
ing is conceptually clean, the details of making it work in practice pose significant 
challenges. The first is the design of suitable approximations to actually be able 
to gain performance when precision is lost. The second is the design of features 
@ that are cheap to compute yet expressive enough to capture key properties 
of abstract states. Finally, a suitable reward function combining both precision 
and performance is needed. We show how to solve these challenges for Polyhedra 
analysis. 


Main Contributions. Our main contributions are: 


— Aspace of sound, approximate Polyhedra transformers spanning different pre- 
cision/performance trade-offs. The new transformers combine online decom- 
position with different constraint removal and merge strategies for approxi- 
mations (Sect. 3). 
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— A set of feature functions which capture key properties of abstract states and 
transformers, yet are efficient to extract (Sect. 4). 

— A complete instantiation of RL for Polyhedra analysis based on Q-learning 
with linear function approximation (i.e., actions, reward function, Q- 
function). 

— An end-to-end implementation and evaluation of our approach. Given a train- 
ing dataset of programs, we first learn a policy (based on the Q-function) 
over analysis runs of these programs. We then use the resulting policy during 
analysis of new, unseen programs. The experimental results on a set of realis- 
tic programs (e.g., Linux device drivers) show that our RL-based Polyhedra 
analysis achieves substantial speed-ups (up to 515x) over a heavily optimized 
state-of-the-art Polyhedra library. 


We believe the reinforcement learning based approach outlined in this work 
can be applied to speed up other program analyzers (beyond Polyhedra). 


2 Reinforcement Learning for Static Analysis 


In this section we first introduce the general framework of reinforcement learning 
and then discuss its instantiation for static analysis. 


2.1 Reinforcement Learning 


Reinforcement learning (RL) [24] involves an agent learning to achieve a goal by 
interacting with its environment. The agent starts from an initial representation 
of its environment in the form of an initial state sọ € S where S is the set of 
possible states. Then, at each time step t = 0,1,2,..., the agent performs an 
action a; € A in state s; (A is the set of possible actions) and moves to the next 
state s;,41. The agent receives a numerical reward r(s;,a;, $41) € R for moving 
from the state s; to s;,1 through action a;. The agent repeats this process until 
it reaches a final state. Each sequence of states and actions from an initial state 
to the final state is called an episode. 

In RL, state transitions typically satisfy the Markov property: the next state 
5411 depends only on the current state s; and the action a; taken from s+. A policy 
p: S — Ais a mapping from states to actions: it specifies the action a; = p(s;) 
that the agent will take when in state s;. The agent’s goal is to learn a policy that 
maximizes not an immediate but a cumulative reward for its actions in the long 
term. The agent does this by selecting the action with the highest expected long- 
term reward in a given state. The quality function (Q-function) Q: S x A— R 
specifies the long term cumulative reward associated with choosing an action a; 
in state s;. Learning this function, which is not available a priori, is essential for 
determining the best policy and is explained next. 
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Algorithm 1. Q-learning algorithm 
1: function Q-LEARN(S, A,r, y, a, $) 
2 Input: 
3 S — set of states, A — set of actions, r — reward function 
4 y — discount factor, a — learning rate 
5: $ — set of feature functions over S and A 
6: Output: parameters 0 
T: 
8 
9 


0 — Initialize arbitrarily (which also initializes Q) 
for each episode do 
: Start with an initial state so € S 
10: for t = 0,1,2,...,length(episode) do 


11: Take action az, observe next state 5:41 and r(si, ac, St+1) 
12: 0 := 0-E ac (r(si, at, Se41) y maxa, a Q(sici ac) — Q( st, at)): (Se, at) 
13: return 0 


Q-learning and Approximating the Q-function. Q-learning [25] can be 
used to learn the Q-function over state-action pairs. Typically the size of the 
state space is so large that it is not feasible to explicitly compute the Q-function 
for each state-action pair and thus the function is approximated. In this paper, we 
consider a linear function approximation of the Q-function for three reasons: (i) 
effectiveness: the approach is efficient, can handle large state spaces, and works 
well in practice [6]; (ii) it leverages our application domain: in our setting, it is 
possible to choose meaningful features (e.g., approximation of volume and cost 
of transformer) that relate to precision and performance of the static analysis 
and thus it is not necessary to uncover them automatically (as done, e.g., by 
training a neural net); and (iii) interpretability of policy: once the Q-function 
and associated policy are learned they can be inspected and interpreted. 

The Q-function is described as a linear combination of / basis functions 
di: S x A — R, i = 1,...,£. Each ¢; is a feature that assigns a value to a 
(state, action) pair and £ is the total number of chosen features. The choice of 
features is important and depends on the application domain. We collect the 
feature functions into a vector ó(s,a) = (¢1(s, a), d2(s,a),...,e(s,a)); doing 
so, the Q-function has the form: 


£ 
Q(s, a) =>) 9; - ġ;(s,a) = G(s, a) - 07, (1) 


where 0 = (61, 62,...,4¢) is the parameter vector. The goal of Q-learning with 
linear function approximation is thus to estimate (learn) 0. 

Algorithm 1 shows the Q-learning procedure. In the algorithm, 0 < y < 1 
is the discount factor which represents the difference in importance between 
immediate and future rewards. y = 0 makes the agent only consider immediate 
rewards while y ~ 1 gives more importance to future rewards. The parameter 
0«o€1listhe learning rate that determines the extent to which the newly 
acquired information overrides the old information. The algorithm first initializes 
0 randomly. Then, for each step t in an episode, the agent takes an action aş, 
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Table 1. Mapping of RL concepts to Static analysis concepts. 


RL concept Static analysis concept 
Agent Static analyzer 

State sc S Features of abstract state 
Action a € .A Abstract transformer 


Reward function r | Transformer precision and runtime 


Feature Value associated with abstract state features and transformer 


moves to the next state s;,4 and receives a reward r(s;,a;, 5441). Line 12 in 
the algorithm shows the equation for updating the parameters 0. Notice that Q- 
learning is an off-policy learning algorithm as the update in the equation assumes 
that the agent follows a greedy policy (from state s;,1) while the action (a;) 
taken by the agent (in s;) need not be greedy. 

Once the Q-function is learned, a policy p* for maximizing the agent's cumu- 
lative reward is obtained as: 


p" (s) = argmax,. 4Q(s, a). (2) 


In the application, p* is computed on the fly at each stage s by computing Q for 
each action a and choosing the one with maximal Q(s, a). Since the number of 
actions is typically small, this incurs little overhead. 


2.2 Instantiation of RL to Static Analysis 


We now discuss a general recipe for instantiating the RL framework described 
above to the domain of static analysis. The precise formal instantiation to the 
specific numerical (Polyhedra) analysis is provided later. 

In Table 1, we show a mapping between RL and program analysis concepts. 
Here, the analyzer is the agent that observes its environment, which is the 
abstract program state (e.g., polyhedron) arising at every iteration of the anal- 
ysis. In general, the number of possible abstract states can be very large (or 
infinite) and thus, to enable RL in this setting, we abstract the state through 
a set of features (Table2). An example of a feature could be the number of 
bounded program variables or the volume of a polyhedron. The challenge is 
to define the features to be fast to evaluate, yet sufficiently representative so 
the policy derived through learning generalizes well to unseen abstract program 
states. 

Further, at every abstract state, the analyzer should have the choice between 
different actions corresponding to different abstract transformers. The trans- 
formers should range from expensive and precise to cheap and approximate. 
The reward function r is thus composed of a measure of precision and speed and 
should encourage approximations that are both precise and fast. 

'The goal of our agent is to then learn an approximation policy that at each 
step selects an action that tries to minimize the loss of analysis precision at fix- 
point, while gaining overall performance. Learning such a policy is typically done 
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offline using a given dataset D of programs (discussed in evaluation). However, 
this is computationally challenging because the dataset D can contain many 
programs and each program will need to be analyzed many times over during 
training: even a single run of the analysis can contain many (e.g., thousands) calls 
to abstract transformers. Thus, a good heuristic may be a complicated function 
of the chosen features. Hence, to improve the efficiency of learning in practice, 
one would typically exercise the choice for multiple transformers/actions only 
at certain program points. A good choice, and one we employ, are join points, 
where the most expensive transformer in numerical domains usually occurs. 

Another key challenge lies in defining a suitable space of transformers. As we 
will see later, we accomplish this by leveraging recent advances in online decom- 
position for numerical domains [20-22]. We show how to do that for the notori- 
ously expensive Polyhedra analysis; however, the approach is easily extendable 
to other popular numerical domains, which all benefit from decomposition. 


3 Polyhedra Analysis and Approximate Transformers 


In this section we first provide brief background on polyhedra analysis and online 
decomposition, a recent technique to speed up analysis without losing precision 
and applicable to all popular numerical domains [22]. Then we leverage online 
decomposition to define a flexible approximation framework that loses precision 
in a way that directly translates into performance gains. This framework forms 
the basis for our RL approach discussed in Sect. 4. 


3.1 Polyhedra Analysis 


Let 4 = ([z1,25,...,x4] be the set of n (numerical) program variables where 
each variable x; € Q takes a rational value. An abstract element P C Q” in the 
Polyhedra domain is a conjunction of linear constraints X ei a,x; € c between 
the program variables where a; € Z,c € Q. This is called the constraint repre- 
sentation of the polyhedron. 


Constraints and Generator Represen- (10) 
tation. For efficiency, it is common to Py m 
maintain besides the constraint represen- 3^ 
tations also the generator representation, » | 9» 
which encodes a polyhedron as the convex x22 
hull of a finite set of vertices, rays, and lines. 2) 
Rays and lines are represented by their direc- (1.0) 
tion. Thus, by abuse of prior notation we x, 
write P = (Cp,Gp) where Cp is the con- 

straints representation (before just called P) Fig.1. Two representations of 


and Gp is the generator representation. polyhedron P: As conjunction of 4 
constraints Cp, and as convex hull 
of 3 vertices and 2 rays Gp. 
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Example 1. Figure 1 shows an example of the two representations of an abstract 
element P in the Polyhedra domain. Cp is the intersection of 4 linear constraints: 


Cpe={ qj 2; T2 [L 2, £2 < 10,329 — 524 < 5}. 
Gp is the convex hull of 3 vertices and 2 rays: 
Gp = (vertices, rays, lines} = {{(2, 2), (2,5), (5, 10)}, {(1, 0), (1, 0)}, 0. 


Notice that Gp contains two rays in the same direction (1,0); thus one of them 
could be removed without changing the set of points in P. 


During analysis, the abstract elements are manipulated with abstract trans- 
formers that model the effect of statements and control flow in the program such 
as assignment, conditional, join, and others. Upon termination of the analysis, 
each program statement has an associated subsequent P containing all possible 
variable values after this statement. The main bottleneck for the Polyhedra anal- 
ysis is the join transformer (U), and thus it is the focus for our approximations. 

Recently, Polyhedra domain analysis was sped up by orders of magnitude, 
without approximation, using the idea of online decomposition [21]. The basic 
idea is to dynamically decompose the occurring abstract elements into indepen- 
dent components (in essence abstract elements on smaller variable sets) based on 
the connectivity between variables in the constraints, and to maintain this (per- 
manently changing) decomposition during analysis. The finer the decomposition, 
the faster the analysis. 

Our approximation framework builds on online decomposition. The basic idea 
is simple: we approximate by dropping constraints to reduce connectivity among 
constraints and thus to yield finer decompositions of abstract elements. These 
directly translate into speedup. We consider various options of such approxima- 
tion; reinforcement learning (in Sect. 4) will then learn a proper, context-sensitive 
strategy that stipulates when and which approximation option to apply. 

Next, we provide brief background on the ingredients of online decomposition 
and explain our mechanisms for soundly approximating the join transformer. 


3.2 Online Decomposition 


Online decomposition is based on the observation that during analysis, the set 
of variables X in a given polyhedron P can be partitioned as mp = {%,...,X,} 
into blocks X4, such that constraints exist only between variables in the same 
block. Each unconstrained variable x; € ¥ yields a singleton block {x;}. Using 
this partition, P can be decomposed into a set of smaller Polyhedra P(A;) called 
factors. As a consequence, the abstract transformer can now be applied only on 
the small subset of factors relevant to the program statement, which translates 
into better performance. 


Example 2. Consider the set X = {x£1, £2, £3, 24,25, £6} and the polyhedron: 


P = (22, — 3x2 + £3 + 24 < 0,25 = 0]. 
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Here, mp = {{21, £2, £3, £4}, (x5), (o) is a possible partition of X with factors 


P(X) = (221 = 3X2 + £3 + 2X4 < 0), P(X) = {x5 = 0}, P(X3) = (). 


The set of partitions of X forms a lattice with the ordering 7 E 7’ iff every block 
of 7 is a subset of a block of x’. Upper and lower bound of two partitions 71,7, 
i.e., mı U T2 and 7, M T2 are defined accordingly. 

The optimal (finest) partition for an element P is denoted with 7p. Ideally, 
one would always determine and maintain this finest partition for each output Z 
of a transformer but it may be too expensive to compute. Thus, the online 
decomposition in [20,21] often computes a (cheaply computable) permissible 
partition Tz 3 mz. Note that making the output partition coarser (while keeping 
the same constraints) does not change the precision of the abstract transformer. 


3.3 Approximating the Polyhedra Join 


Let Teom = Tp, U Tp, be a common permissible partition for the inputs Pi, P» 
of the join transformer. Then, from [21], a permissible partition for the (not 
approximated) output is obtained by keeping all blocks A; € Teom for which 
P(X) = P3(44) in the output partition 7z, and fusing all remaining blocks 
into one. Formally, zz = {MN} UU, where 


N — [JG € Teom : PAR) Z P(0)), U = {Xe € Teom : PA) = Po(Xn)}. 


The join transformer computes the generators Gz for the output Z as Gz = 
Up, (xw) X (p, (v) U Gp, ()) where x is the Cartesian product. The constraint 
representation Cz is computed as Cz = Cp, (xy) Uconversion(Gp, (7) UO p,(wP))- 
'The conversion algorithm has worst-case exponential complexity and is the most 
expensive step of the join. Note that the decomposed join applies it only on the 
generators p, (v) U Graw) corresponding to the block M. 

'The cost of the decomposed join transformer depends on the size of the block 
N. Thus, it is desirable to bound this size by a threshold € N. Let B = (A € 
Tcom : Xk NN Æ Ü} be the set of blocks that merge into M in the output 7z and 
B, = (Ax € B : |X| > threshold} be the set of blocks in B with size > threshold. 


Splitting of Large Blocks. For each block X, € B, we apply the join on 
the associated factors: Z(A5) = Pı (X:+) U P3(&X4). We then remove constraints 
from Z(A;) until it decomposes into blocks of sizes € threshold. Since we only 
remove constraints from Z(4X;), the resulting transformer remains sound. There 
are many choices for removing constraints as shown in the next example. 


Example 3. Consider the following polyhedron and threshold = 4 


A, = (21,22, 23, £4, 35, L6}, 
Z(A4) = {a1 — £2 + £3 € 0, £2 + £3 + £4 < 0, £2 +23 <0, 
£3 + £4 < 0, z4 — T5 < 0, x4 — vg < Of. 
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We can remove M = {x4 — z5 < 0,24 — xe € 0} from Z(A;) to obtain the 
constraint set {£1 — £2 + £3 < 0,22 + £3 + x4 X 0,22 + 3a < 0,23 +24 < 0} with 
partition {{x1, £2, £3, £4}, {£5}, {x£6}}, which obeys the threshold. 

We could also remove M' = [zs + £3 + £4 € 0, £3 + x4 € 0} from Z(A) to 
get the constraint set {£1 — £2 + z3 < 0, £2 + £3 < 0, £4 — z5 < 0,24 — ze < 0} 
with partition {{£1, £2, £3}, {£4, £5, £6}}, which also obeys the threshold. 


We next discuss our choices for the constraint removal algorithm. 


Stoer-Wagner min-cut. The first basic idea is to remove a minimal number 
of constraints in Z(A;) that decomposes the block A, into two blocks. To do 
so, we associate with Z(A;) a weighted undirected graph G = (V,€), where 
Y = X. Further, there is an edge between x; and xj, if there is a constraint 
containing both; its weight mi; is the number of such constraints. We then 
apply the standard Stoer-Wagner min-cut algorithm [23] to obtain a partition 
of X, into X; and A7'. M collects all constraints that need to be removed, i.e., 
those that contain at least one variable from both A7 and «7. 


Example 4. Figure 2 shows the graph G for Z(X;) in Example 3. Applying the 
Stoer- Wagner min-cut on G once will cut off x5 or xg by removing the constraint 
xz4— T5 Or 14— ze, respectively. In either case a block of size 5 remains, exceeding 
the threshold of 4. After two applications, both constraints have been removed 
and the resulting block structure is given by ((z1,x2, x3, £4}, {£5}, (v6) )- The 
associated factors are {£1 — £2 +2£3 € 0,23-- 34-24 < 0,254 x3 < 0,23+24 < 0} 
and z5,x6 become unconstrained. 


Weighted Constraint Removal. 
Our second approach for constraints 
removal does not associate weights 
with edges but with constraints. It 
then removes greedily edges with high 
weights. Specifically, we consider the 
following two choices of constraint 
weights, yielding two different con- Fig.2. Graph G for Z(¥,) in Example 3 
straint removal policies: 


— For each variable x; € X,, we first compute the number n; of constraints 
containing x;. The weight of a constraint is then the sum of the n; over all 
variables occurring in the constraint. 

— For each pair of variables z;,vr; € X;, we first compute the number nj; of 
constraints containing both x; and zj. The weight of a constraint is then the 
sum of the nj; over all pairs z;, x; occurring in the constraint. 


Once the weights are computed, we remove the constraint with maximum weight. 
The intuition is that variables in this constraint most likely occur in other 
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constraints in Z(X;) and thus they do not become unconstrained upon con- 
straint removal. This reduces the loss of information. 


Example 5. Applying the first definition of weights in Example 8, we get nı = 
1, no = 3,n3 = 4,n4 = 4,n5 = 1,ng = 1. The constraint xo + z3 +zx4 € 0 has the 
maximum weight of ng+n3+n4 = 11 and thus is chosen for removal. Removing 
this constraint from Z(A,) does not yet yield a decomposition; thus we have to 
repeat. Doing so {x3 + x4 € 0} is chosen. Now, Z(%;)\M = {a1 — za + z3 < 
0, z24-zx3 < 0,4 —z5 € 0,24— x6 < 0} which can be decomposed into two factors 
{a1 — z2 + £3 € 0, £2 + 23 < 0} and {a4 — z5 < 0,24 — ze € 0} corresponding 
to blocks {a1, 22,23} and (z4,x5,x6], respectively, each of size < threshold. 


Merging Blocks. The sizes of all blocks in B X B; are < threshold and we can 
apply merging to obtain larger blocks X, < threshold to increase the precision 
of the subsequent join. The join is then applied on the factors Pi (Vn), P»(A5,) 
and the result is added to the output Z. We consider the following three merging 
strategies. To simplify the explanation, we assume that the blocks in B V 5; are 
ordered by ascending size: 


1. No merge: None of the blocks are merged. 

2. Merge smallest first: We start merging the smallest blocks as long as the size 
stays below the threshold. These blocks are then removed and the procedure 
is repeated on the remaining set. 

3. Merge large with small: We start to merge the largest block with the smallest 
blocks as long as the size stays below the threshold. These blocks are then 
removed and the procedure is repeated on the remaining set. 


Example 6. Consider threshold = 5 and B X B, with block sizes 
(1,1,2,2,2,2,3,5, 7,10]. Merging smallest first yields blocks 1 + 1 + 2, 2 4- 2, 
2+ 3 leaving the rest unchanged. The resulting sizes are (4,4,5,5, 7, 10). Merg- 
ing large with small leaves 10, 7,5 unchanged and merges 3+1+1, 2+2, and 
2+2. The resulting sizes are also (4,4,5,5, 7,10] but the associated factors are 
different (since different blocks are merged), which will yield different results in 
following transformations. 


Need for RL. Algorithm 2 shows how to approximate the join transformer. 
Different choices of threshold, splitting, and merge strategies yield a range of 
transformers with different performance and precision depending on the inputs. 
All of the transformers are non-monotonic, however the analysis always converges 
to a fixpoint when combined with widening [2]. Determining the suitability of a 
given choice on an input is highly non-trivial and thus we use RL to learn it. 
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Algorithm 2. Approximation algorithm for Polyhedra join 


1: function APPROXIMATE JOIN((7p,, Pi), (Tp, P2), threshold) 


Input: 


=U 


(Tp, Pi), (Tpz, P2) — decomposed inputs to the join 
threshold — Upper bound on size of N 
Output: decomposed output (Tz, Z) of the join 
Z Jano): POR) = PXQX))7z 
B :— {Xk € Tp, UTP, : X AON AO}, Be := {X € B: |X| > threshold} 


> 


initialize output 


> join factors for blocks in By and split the outputs via a split algorithm 


8: for X, € B, do 

9: P' := Pi (&) LI P2(X) 

10: s_algo := split_alg(Ai,Cpr), (C, m) :— split(Xt, Cp, threshold, s_algo) 
11: for X» € T do 

12: G(Xy) :— conversion(C(X/)), Z := ZU (C(Xv), 9(Xv)) 

19: Tz = Tz UT 


> merge blocks € B \ Bi via a merge algorithm and apply join 
14: m.algo :— merge-alg(B \ Bi), Bm :— merge(B \ Bi, threshold, m_algo) 


15: for Xm € Bm do 


16: Z:=ZU(Pi(Am) OU Po(Xm)), TZ := Tz U {Xm} 


return (Tz, Z) 


Table 2. Features for describing RL state s (m € {1,2},0 < j < 80 € h <3). 


Feature wi Extraction |Typical |n;|Buckets for feature qw; 

complexity |range 
|B| O(1) 1-10 10 ([j + 1,7 + 1]} U {[10, 09)) 
nmin(|X;|: Xk € B) O(|B]) 1-100 |10{[10- j +1,10- (j + 1)]} U {[91, ~)} 
max(|V,| : Xk € B) O(|B|) 1-100 |10{[10- j +1,10- (j +1)]} U {[91, 00)) 
avge(|V,| : Xk € B) O(|B|) 1-100 |10{[10- j +1,10- (j + 1)]} U {[91, ~)} 
nin(ll JG», (x, : Xk € B) O(|B]) 1-1000 |10|{[100- 7 + 1, 100 - (j + 1)]} U ([901, o0) 
max(1 JG», (x, : Xk € B) O(|B|) 1-1000 |10|{[100- 7 + 1, 100 - (j + 1)]) U ([901, o0) 
avg(I LJ Semix) : Xk € B) O(|B|) 1-1000 |10|{[100- 7 + 1,100 - (j + 1)]) U ([901, o0) 
|(z; € X :2; € [lm,Um] in Pm}|  |O(ng) 1-25 5 {[5-h +1,5- (h + 1)]} U {[21, c)} 
|(z; € X : xı € [lm, 00) in Pm}| + |O(ng) 1-25 5 {[5-h +1,5. (h + 1)]} U {[21, c)} 


|(zi € X : xi € (7-00, um] in Pm}| 


4 Reinforcement Learning for Polyhedra Analysis 


We now describe how to instantiate reinforcement learning for approximating 
Polyhedra domain analysis. The instantiation consists of the following steps: 


— Extracting the RL state s from the abstract program state numerically using 


a set of features. 


— Defining actions a as the choices among the threshold, merge and split meth- 


ods defined in the previous section. 


— Defining a reward function r favoring both high precision and fast execution. 
— Defining the feature functions ¢(s,a) to enable Q-learning. 
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States. We consider nine features for defining a state s for RL. The features 
Yi, their extraction complexity and their typical range on our benchmarks are 
shown in Table2. The first seven features capture the asymptotic complexity of 
the join [21] on the input polyhedra P; and P». These are the number of blocks, 
the distribution (using maximum, minimum and average) of their sizes, and the 
number of generators. The precision of the inputs is captured by considering the 
number of variables x; € A with finite upper and lower bound, and the number 
of those with only a finite upper or lower bound in both P, and P5. 

As shown in Table 2, each state feature v; returns a natural number, how- 
ever, its range can be rather large, resulting in a massive state space. To ensure 
scalability and generalization of learning, we use bucketing to reduce the state 
space size by clustering states with similar precision and expected join cost. The 
number n; of buckets for each y; and their definition are shown in the last two 
columns of Table2. Using bucketing, the RL state s is then a 9-tuple consisting 
of the indices of buckets where each index indicates the bucket that 4»;'s return 
value falls into. 


Actions. An action a is a 3-tuple (th, r_algo, m_algo) consisting of: 


— th € {1,2,3,4} depending on threshold € [5,9], [10,14], [15,19], or [20, oo). 
— r.algo € {1,2,3}: the choice of a constraint removal, i.e., splitting method. 
- m.algo € (1,2,3): the choice of merge algorithm. 


All three of these have been discussed in detail in Sect. 3. The threshold values 
were chosen based on performance characterization on our benchmarks. With 
the above, we have 36 possible actions per state. 


Reward. After applying the (approximated join transformer) according to 
action a; in state s+, we compute the precision of the output polyhedron Pj U P> 
by first computing the smallest (often unbounded) box! covering Pi U Pz which 
has complexity O(ng). We then compute the following quantities from this box: 


— ns: number of variables x; with singleton interval, i.e., x; € [l, u], 1 = u. 

— ny: number of variables x; with finite upper and lower bounds, i.e., r; € 
l, u], 0 Au. 

— nay: number of variables x; with either finite upper or finite lower bounds, 
i.e., ©; € (—oo, u] or x; € [l, oc). 


Further, we measure the runtime in CPU cycles cyc for the approximate join 
transformer. The reward is then defined by 
(St, Gt, 5141) = 3: Ns + 2n + np — logio(cyc). (3) 


As the order of precision for different types of intervals is: singleton > 
bounded > half bounded interval, the reward function in (3) weighs their num- 
bers by 3,2,1. The reward function in (3) favors both high performance and 


1 A natural measure of precision is the volume of Pi U P». However, calculating it is 
very expensive and Pı U P» is often unbounded. 
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Table 3. Instantiation of Q-learning to Polyhedra static analysis. 


RL concept Polyhedra analysis instantiation 
Agent Polyhedra analysis 

State s € S As described in Table 2 

Action a € A Tuple (th, r algo, m_algo) 
Reward function r | Shown in (3) 

Feature ó Defined in (4) 

Q-function Q-function from (5) 


precision. It also ensures that the precision part (3 -ns + 2n -- nnb) has a similar 
magnitude range as the performance part (log,,(cyc))*. 


Q-function. As mentioned before, we approximate the Q-function by a linear 
function (1). We define binary feature functions $;j; for each (state, action) pair. 
Qijk(s, a) = 1 if the tuple s(i) lies in j-th bucket and action a = a; 


Qijk(s,a) — 1 <=> s(i) = j and a = ax (4) 
The Q-function is a linear combination of state action features Qijk 


ni 36 


9 
Q(s,a) = * 9 9 Sse dux (s. a). (5) 


i=1 j=1 k=1 


Q-learning. During the training phase, we are given a dataset of programs 
D and we use Q-LEARN from Algorithm 1 on each program in D to perform 
Q-learning. Q-learning is performed with input parameters instantiated as 
explained above and summarized in Table3. Each episode consists of a run of 
Polyhedra analysis on a benchmark in D. We run the analysis multiple times on 
each program in D and update the Q-function after each join by calling Q-LEARN. 

A Q-function is typically learned using an e-greedy policy [24] where the 
agent takes greedy actions by exploiting the current Q-estimates while also 
exploring randomly. The policy requires initial random exploration to learn good 
Q-estimates that can be later exploited. T'his is infeasible for the Polyhedra anal- 
ysis as a typical episode contains thousands of join calls. Therefore, we gener- 
ate actions for Q-learning by exploiting the optimal policy for precision (which 
always selects the precise join) and explore performance by choosing a random 
approximate join: both with a probability of 0.5?. 


? The log is used since the join has exponential complexity. 
3 We also tried exploitation probabilities of 0.7 and 0.9, however the resulting policies 
had suboptimal performance during testing due to limited exploration. 
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Formally, the action a; :— p(s;) selected in state s; during learning is given 
by a4 = (th, r_algo, m_algo) where 


| JrandO % 4*1 with probability 0.5 
— | min(4, (2, |34,)/5) with probability 0.5 — ' (6) 


r_algo = rand() 96 3 + 1, m.algo = rand() 96 3 + 1. 


Obtaining the Learned Policy. After learning over the dataset D, the learned 
approximating join transformer in state s; chooses an action according to (2) 
by selecting the maximal value over all actions. The value of th — 1,2,3,4 is 
decoded as threshold — 5,10,15, 20 respectively. 


5 Experimental Evaluation 


We implemented our approach in the form of a C-library for Polyhedra analysis, 
called Poly-RL. We compare the performance and precision of Poly-RL against 
the state-of-the-art ELINA [1], which uses online decomposition for Polyhedra 
analysis without losing precision. In addition, we implemented two Polyhedra 
analysis approximations (baselines) based on the following heuristics: 


— Poly-Fixed: uses a fixed strategy based on the results of Q-learning. Namely, 
we selected the threshold, split and merge algorithm most frequently chosen 
by our (adaptive) learned policy during testing. 

— Poly-Init: uses an approximate join with probability 0.5 based on (6). 


All Polyhedra implementations use 64-bit integers to encode rational num- 
bers. In the case of overflow, the corresponding polyhedron is set to top. 


Experimental Setup. All our experiments including learning the parameters 0 
for the Q-function and the evaluation of the learned policy on unseen benchmarks 
were carried out on a 2.13 GHz Intel Xeon E7- 4830 Haswell CPU with 24MB 
L3 cache and 256 GB memory. All Polyhedra implementations were compiled 
with gcc 5.4.0 using the flags -03 -m64 -march-native. 


Analyzer. For both learning and evaluation, we used the crab-llum analyzer 
for C-programs, part of the larger SeaHorn [7] verification framework. The ana- 
lyzer performs intra-procedural analysis of llvm-bitcode to generate Polyhedra 
invariants which can be used for verifying assertions using an SMT solver [11]. 


Benchmarks. SVCOMP [3] contains thousands of challenging benchmarks in 
different categories suited for different kinds of analysis. We chose the Linux 
Device Drivers (LD) category, known to be challenging for Polyhedra analysis 
[21] as to prove properties in these programs one requires Polyhedra invariants 
(and not say Octagon invariants which are weaker). 
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Training Dataset. We chose 70 large benchmarks for Q-learning. We ran each 
benchmark a thousand times over a period of three days to generate sample 
traces of Polyhedra analysis containing thousands of calls to the join transformer. 
We set a timeout of 5 minutes per run and discarded incomplete traces in case 
of a timeout. In total, we performed Q-learning over 110811 traces. 


Evaluation Method. For evaluating the effectiveness of our learned policy, we 
then chose benchmarks based on the following criteria: 


— No overfitting: the benchmark was not used for learning the policy. 

— Challenging: ELINA takes > 5s on the benchmark. 

— Fair: there is no integer overflow in the expensive functions in the benchmark. 
Because in the case of an overflow, the polyhedron is set to top resulting in 
a trivial fixpoint at no cost and thus in a speedup that is due to overflow. 


Based on these criteria, we found 11 benchmarks on which we present our results. 
We used a timeout of 1 h and memory limit of 100 GB for our experiments. 


Inspecting the Learned Policy. Our learned policy chooses in the major- 
ity of cases threshold—20, the binary weighted constraint removal algorithm for 
splitting, and the merge smallest first algorithm for merging. Poly-Fixed always 
uses these values for defining an approximate transformer, i.e., it follows a fixed 
strategy. Our experimental results show that following this fixed strategy results 
in suboptimal performance compared to our learned policy that makes adaptive, 
context-sensitive decisions to improve performance. 


Results. We measure the precision as a fraction of program points at which 
the Polyhedra invariants generated by approximate analysis are semantically the 
same or stronger than the ones generated by ELINA. This is a less biased and 
more challenging measure than the number of discharged assertions [4,18,19] 
where one can write weak assertions that even a weaker domain can prove. 

Table 4 shows the number of program points^, timings (in seconds), and the 
precision (in 96) of Poly-RL, Poly-Fixed, and Poly-Init w.r.t. ELINA on all 11 
benchmarks. In the table, the entry TO (MO) means that the analysis did not 
finish within 1h (exceeded the memory limit). For an incomplete analysis, we 
compute the precision by comparing program points for which the incomplete 
analysis can produce invariants. 


Poly-RL vs ELINA. In Table4, Poly-RL obtains > 7x speed-up over ELINA 
on 6 of the 11 benchmarks with a maximum of 515x speedup for the mfd_sm501 
benchmark. It also obtains the same or stronger invariants on > 87% of program 


^ The benchmarks contain up to 50K LOC but SeaHorn encodes each basic block as 
one program point, thus the number of points in Table4 is significantly reduced. 
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Table 4. Timings (seconds) and precision of approximations (%) w.r.t. ELINA. 


Benchmark #Program | ELINA | Poly-RL Poly-Fixed Poly-Init 
Points Time Time | Precision | Time | Precision | Time | Precision 
wireless_airo | 2372 877 6.6 100 6.7 100 5.2 74 
net_ppp 680 2220 9.1 87 TO 34 7.7 55 
mfd_sm501 369 1596 3.1 97 1421 | 97 2 64 
ideapad_laptop | 461 172 2.9 100 157 | 100 MO 41 
pata_legacy 262 41 2.8 41 2.5 41 MO 27 
usb_ohci 1520 22 2.9 100 34 100 MO 50 
usb_gadget 1843 66 37 60 35 60 TO 40 
wireless_b43 3226 19 13 66 TO 28 83 34 
lustre_llite 211 5.7 4.9 98 5.4 98 6.1 54 
usb_cx231xx 4752 7.3 3.9 100 3.7 8:100 3.9 94 
netfilter ipvs | 5238 20 17 100 9.8 100 11 94 


points on 8 benchmarks. Note that Poly-RL obtains both large speedups and 
the same invariants at all program points on 3 benchmarks. 

The widening transformer removes many constraints produced by the precise 
join transformer from ELINA which allows Poly-RL to obtain the same invari- 
ants as ELINA despite the loss of precision during join in most cases. Poly-RL 
produces large number of non-comparable fixpoints on 3 benchmarks in Table 4 
due to non-monotonic join transformers. 

We also tested Poly-RL on 17 benchmarks from the product lines category. 
ELINA did not finish within an hour on any of these benchmarks whereas Poly- 
RL finished within 1s. Poly-RL had 100% precision on the subset of program 
points at which ELINA produces invariants. With Poly-RL, SeaHorn successfully 
discharged the assertions. We did not include these results in Table4 as the 
precision w.r.t. ELINA cannot be completely compared. 


Poly-RL vs Poly-Fixed. Poly-Fixed is never significantly more precise than 
Poly-RL in Table 4. Poly-Fixed is faster than Poly-RL on 4 benchmarks, however 
the speedups are small. Poly-Fixed is slower than ELINA on 3 benchmarks 
and times out on 2 of these. This is due to the overhead of the binary weight 
constraints removal algorithm and the exponential number of generators in the 
output. 


Poly-RL vs Poly-Init. From (6), Poly-Init takes random actions and thus the 
quality of its result varies depending on the run. Table 4 shows the results on a 
sample run. Poly-RL is more precise than Poly-Init on all benchmarks in Table 4. 
Poly-Init also does not finish on 4 benchmarks. 
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6 Related Work 


Our work can be seen as part of the general research direction on parametric 
program analysis [4,9,14,18,19], where one tunes the precision and cost of the 
analysis by adapting it to the analyzed program. The main difference is that prior 
approaches fix the learning parameters for a given program while our method 
is adaptive and can select parameters dynamically based on the abstract states 
encountered during analysis, yielding better cost/precision tradeoffs. Further, 
prior work measures precision by the number of assertions proved whereas we 
target the stronger notion of fixpoint equivalence. 

The work of [20,21] improve the performance of Octagon and Polyhedra 
domain analysis respectively based on online decomposition without losing pre- 
cision. We compared against [21] in this paper. As our results suggest, the perfor- 
mance of Polyhedra analysis can be significantly improved with RL. We believe 
that our approach can be easily extended to the Octagon domain for achieving 
speedups over the work of [20] as the idea of online decomposition applies to all 
sub-polyhedra domains [22]. 

Reinforcement learning based on linear function approximation of the Q- 
function has been applied to learn branching rules for SAT solvers in [13]. 
'The learned policies achieve performance similar to those of the best branching 
rules. We believe that more powerful techniques for RL such as deep Q-networks 
(DQN) [17] or double Q-learning [8] can be investigated to potentially improve 
the quality of results produced by our approach. 


7 Conclusion 


Polyhedra analysis is notoriously expensive and has worst-case exponential com- 
plexity. We showed how to gain significant speedups by adaptively trading preci- 
sion for performance during analysis, using an automatically learned policy. Two 
key insights underlie our approach. First, we identify reinforcement learning as a 
conceptual match to the learning problem at hand: deciding which transformers 
to select at each analysis step so to achieve the eventual goal of high preci- 
sion and fast convergence to fixpoint. Second, we build on the concept of online 
decomposition, and offer an effective method to directly translate precision loss 
into significant speed-ups. Our work focused on polyhedra analysis for which we 
provide a complete implementation and evaluation. We believe the approach can 
be instantiated to other forms of static analysis in future work. 
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Abstract. We present an alternative Double Description representa- 
tion for the domain of NNC (not necessarily closed) polyhedra, together 
with the corresponding Chernikova-like conversion procedure. The rep- 
resentation uses no slack variable at all and provides a solution to a 
few technical issues caused by the encoding of an NNC polyhedron as 
a closed polyhedron in a higher dimension space. A preliminary exper- 
imental evaluation shows that the new conversion algorithm is able to 
achieve significant efficiency improvements. 


1 Introduction 


The Double Description (DD) method [28] allows for the representation and 
manipulation of convex polyhedra by using two different geometric representa- 
tions: one based on a finite collection of constraints, the other based on a finite 
collection of generators. Starting from any one of these representations, the other 
can be derived by application of a conversion procedure [10-12], thereby obtain- 
ing a DD pair. The procedure is incremental, capitalizing on the work already 
done when new constraints and/or generators need to be added to an input DD 
pair. 

The DD method lies at the foundation of many software libraries and tools! 
which are used, either directly or indirectly, in research fields as diverse as 
bioinformatics [31,32], computational geometry [1,2], analysis of analog and 
hybrid systems [8,18,22,23], automatic parallelization [6,29], scheduling [16], 
static analysis of software [4,13,15,17,21,24]. 

In the classical setting, the DD method is meant to compute geometric rep- 
resentations for topologically closed polyhedra in an n-dimensional vector space. 
However, there are applications requiring the ability to also deal with linear strict 
inequality constraints, leading to the definition of not necessarily closed (NNC) 
polyhedra. For example, this is the case for some of the analysis tools developed 
for the verification of hybrid systems [8,18,22,23], static analysis tools such as 
Pagai [24], and tools for the automatic discovery of ranking functions [13]. 

The few DD method implementations providing support for NNC polyhedra 
(Apron and PPL) are all based on an indirect representation. The approach, pro- 
posed in [22,23] and studied in more detail in [3,5], encodes the strict inequality 


! An incomplete list of available implementations includes cdd [19], PolyLib [27], 
Apron [25], PPL [4], 4ti2 [1], Skeleton [33], Addibit [20], ELINA [30]. 
© The Author(s) 2018 
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constraints by means of an additional space dimension, playing the role of a 
slack variable; the new space dimension, usually denoted as e, needs to be non- 
negative and bounded from above, i.e., the constraints 0 < e < 1 are added to 
the topologically closed representation . (called e-representation) of the NNC 
polyhedron P. The main advantage of this approach is the possibility of reusing, 
almost unchanged, all of the well-studied algorithms and optimizations that have 
been developed for the classical case of closed polyhedra. However, the addition 
of a slack variable carries with itself a few technical issues. 


— At the implementation level, more work is needed to make the e dimension 
transparent to the end user. 

— The erepresentation causes an intrinsic overhead: in any generator system 
for an e-polyhedron, most of the “proper” points (those having a positive € 
coordinate) need to be paired with the corresponding *closure" point (having 
a zero € coordinate), almost doubling the number of generators. 

— The DD pair in minimal form computed for an e-representation R, when 
reinterpreted as encoding the NNC polyhedron 7, typically includes many 
redundant constraints and/or generators, leading to inefficiencies. To avoid 
this problem, strong minimization procedures were defined in [3,5] that are 
able to detect and remove those redundancies. Even though effective, these 
procedures are not fully integrated into the DD conversion: they can only be 
applied after the conversion, since they interfere with incrementality. Hence, 
during the iterations of the conversion the e-redundancies are not removed, 
causing the computation of bigger intermediate results. 


In this paper, we pursue a different approach for the handling of NNC poly- 
hedra in the DD method. Namely, we specify a direct representation, dispensing 
with the need of the slack variable. The main insight of this new approach is the 
separation of the (constraints or generators) geometric representation into two 
components, the skeleton and the non-skeleton of the representation, playing 
quite different roles: while keeping a geometric encoding for the skeleton compo- 
nent, we will adopt a combinatorial encoding for the non-skeleton one. For this 
new representation, we propose the corresponding variant of the Chernikova’s 
conversion procedure, where both components are handled by respective pro- 
cessing phases, so as to take advantage of their peculiarities. In particular, we 
develop ad hoc functions and procedures for the combinatorial non-skeleton part. 

'The new representation and conversion procedure, in principle, can be inte- 
grated into any of the available implementations of the DD method. Our exper- 
imental evaluation is conducted in the context of the PPL and shows that the 
new algorithm, while computing the correct results for all of the considered tests, 
achieves impressive efficiency improvements with respect to the implementation 
based on the slack variable. 

The paper is structured as follows. Section 2 briefly introduces the required 
notation, terminology and background concepts. Section 3 proposes the new rep- 
resentation for NNC polyhedra; the proofs of the stated results are in [7]. The 
extension of the Chernikova's conversion algorithm to this new representation is 
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presented in Sect. 4. Section 5 reports the results obtained by the experimental 
evaluation. We conclude in Sect. 6. 


2 Preliminaries 


We assume some familiarity with the basic notions of lattice theory [9]. For a 
lattice (L, C, L, T,N, U}, an element a € L is an atom if L C a and there exists 
no element b € L such that | b C a. For S C L, the upward closure of S is 
defined as 1 S E [reL|3se S. sC rz). The set S C Lis upward closed if 
S = 1 S; we denote by p}(L) the set of all the upward closed subsets of L. For 
x € L, Tz is a shorthand for f(x). The notation for downward closure is similar. 
Given two posets (L,C) and (L^, C*) and two monotonic functions a: L — L* 
and y: L^ — L, the pair (a, y) is a Galois connection [14] between L and L* if 
Vr € L,a* € Lt: a(x) Clato x C ylz’). 

We write R” to denote the Euclidean topological space of dimension n > 0 
and R, for the set of non-negative reals; for S C R”, cl(S) and relint(S) denote 
the topological closure and the relative interior of S, respectively. A topologically 
closed convex polyhedron (for short, closed polyhedron) is defined as the set of 
solutions of a finite system C of linear non-strict inequality and linear equality 
constraints; namely, P = con(C) where 


con(C) E 


pcm" | v8 — (ax % b) € C,rac {>, =}. a^p rab). 

A vector r € IR" such that r Z 0 is a ray of a non-empty polyhedron P C R" 
if, Vp € P and Vp € R,, it holds p+ pr € P. The empty polyhedron has no 
rays. If both r and —r are rays of P, then r is a line of P. The set P C R" isa 
closed polyhedron if there exist finite sets L, R, P C R” such that 0 ¢ (L U R) 
and P = gen((L, R, P3, where 


gen((L, R, P)) © {LA + Rp- Pr e R^ | Ae Roe Rm € R2, Y? m= 1). 


When P Æ (), we say that P is described by the generator system G = (L, R, P). 
In the following, we will abuse notation by adopting the usual set operator 
and relation symbols to denote the corresponding component-wise extensions 
on systems. For instance, for G = (L, R, P) and G’ = (L’, R’, P^), we will write 
GCG to mean L CL’, RC R' and P C P. 

The DD method due to Motzkin et al. [28] allows combining the constraints 
and the generators of a polyhedron P into a DD pair (C, G): a conversion proce- 
dure [10-12] is used to obtain each description starting from the other one, also 
removing the redundant elements. For presentation purposes, we focus on the 
conversion from constraints to generators; the opposite conversion works in the 
same way, using duality to switch the roles of constraints and generators. We 
do not describe lower level details such as the homogenization process, mapping 
the polyhedron into a polyhedral cone, or the simplification step, needed for 
computing DD pairs in minimal form. 
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The conversion procedure starts from a DD pair (Co, Go) representing the 
whole vector space and adds, one at a time, the elements of the input constraint 


system C = {(o,..., 8m}, producing a sequence of DD pairs { (Cx, Ge) koerenti 
representing the polyhedra mE 
R” = Po Ph PRA ee LONE RES oc Pu pesti 


At each iteration, when adding the constraint jj to polyhedron Pk = gen(G;.), 
the generator system C; is partitioned into the three components gr. GP, Gro 
according to the sign of the scalar products of the generators with (3, (those in 
G? are the saturators of Bp); the new generator system for polyhedron Pk+1 is 


computed as G41 = Git UG? U Gf, where G% = comb-adjg, (Gt , g; ) and 


" —^4 def = = = E = 
comb_adjg, (6,0, ) = {combg, (gt, 97) | gt € 92,9. € Gp adip, (9*9 ) }. 


Function ‘combg,’ computes a linear combination of its arguments, yielding a 
generator that saturates the constraint ,; predicate ‘adjp,’ is used to select 
only those pairs of generators that are adjacent in Px. 

The set CP, of all closed polyhedra on the vector space R”, partially ordered 
by set inclusion, is a lattice (CP,,, C, 0, R^, N, W), where the empty set and R” 
are the bottom and top elements, the binary meet operator is set intersection 
and the binary join operator ‘W’ is the convex polyhedral hull. A constraint 
B = (a? r b) is said to be valid for P € CP, if all the points in P satisfy 8; for 
each such £, the subset F = (p € P | a7p = b] is a face of P. We write cFacesp 
(possibly omitting the subscript) to denote the finite set of faces of P € CP,. 
This is a meet sublattice of CP, and P = U{ relint(F) | F € cFacesp }. 

When C is extended to allow for strict inequalities, P = con(C) is an NNC 
(not necessarily closed) polyhedron. The set P,, of all NNC polyhedra on R” 
is a lattice (P4, C, 0, R",, w) and CP,, is a sublattice of P4. As shown in [3, 
Theorem 4.4], a description of an NNC polyhedron P € P, can be obtained by 
extending the generator system with a finite set C of closure points. Namely, for 
G = (L, R, C, P), we define P = gen(G), where 


AER peER’, 
gen((L,R,C,P)) © 4 LA+ Rp Cy Pr € R" |y e Rt, m € Rh, £0, 
ier Vi + ha TM = 1 


For an NNC polyhedron P € P, the finite set nncFacesp of its faces is a meet 
sublattice of P, and P = Ut relint(F) | F € nncFacesp k Letting Q = cl(P), 
the closure operator cl: nncFacesp — cFacesg maps each NNC face of P into 
a face of Q. The image cl(nncFacesp) is a join sublattice of cFacesg and its 
nonempty elements form an upward closed subset, which can be described by 
recording the minimal elements only (i.e., the atoms of the nncFacesp lattice). 
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3 Direct Representations for NNC Polyhedra 


An NNC polyhedron can be described by using an extended constraint system 
C = (Cz, C», C4) and/or an extended generator system G = (L, R, C, P). These 
representations are said to be geometric, meaning that they provide a precise 
description of the position of their elements. For a closed polyhedron P € CPy, 
the use of completely geometric representations is an adequate choice. In the 
case of an NNC polyhedron P € P, such a choice is questionable, since the 
precise geometric position of some of the elements is not really needed. 


Example 1. Consider the NNC polyhedron P € P; in the next figure, where the 
(strict) inequality constraints are denoted by (dashed) lines and the (closure) 
points are denoted by (unfilled) circles. 


P is described by G = (L, R, C, P), where L = R = 0, C = {co,c1,c2} and 
P = (po, pı}. However, there is no need to know the position of point pı, since 
it can be replaced by any other point on the open segment (co, c1). Similarly, 
when considering the constraint representation, there is no need to know the 
exact slope of the strict inequality constraint f. 

We now show that P € P, can be more appropriately represented by integrat- 
ing a geometric description of Q = cl(P) € CP, (the skeleton) with a combinato- 
rial description of nncFacesp (the non-skeleton). We consider here the generator 
system representation; the extension to constraints will be briefly outlined in a 
later section. 


Definition 1 (Skeleton of a generator system). Let G = (L, R, C, P) bea 


generator system in minimal form, P = gen(G) and Q =cl(P). The skeleton of 


G is SKq = skel(G) © (L, R,C U SP,0), where SP C P holds the points that 


can not be obtained by combining the other generators in G. 


Note that the skeleton has no points at all, so that gen(SK go) = 0. However, 


we can define a variant function gen((L, R, C, P)) = gen((L,R,0,C U P)), 


showing that the skeleton of an NNC polyhedron provides a non-redundant 
representation of its topological closure. 


Proposition 1. If P = gen(G) and Q =cl(P), then gen(G) = gen(SKqg) = Q. 
Also, there does not exist G' C SKg such that gen(G") = Q. 


The elements of SP C P are called skeleton points; the non-skeleton points 
in P \ SP are redundant when representing the topological closure; these non- 
skeleton points are the elements in G that need not be represented geometrically. 
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Consider a point p € Q = cl(P) (not necessarily in P). There exists a single 
face F € cFacesg such that p € relint(F). By definition of function ‘gen’, point p 
behaves as a filler for relint(F) meaning that, when combined with the skeleton, 
it generates relint(F). Note that p also behaves as a filler for the relative interiors 
of all the faces in the set 1 F. The choice of p € relint(F) is actually arbitrary: 
any other point of relint(F) would be equivalent as a filler. A less arbitrary 
representation for relint(F) is thus provided by its own skeleton SK C SK; 
we say that SK p is the support for the points in relint(F) and that any point 
p' € relint(gen(SK p)) = relint(F) is a materialization of SK p. 

In the following we will sometimes omit subscripts when clear from context. 


Definition 2 (Support sets for a skeleton). Let SK be the skeleton of an 
NNC polyhedron and let Q = gen(SK) € CP,,. The set of all supports for SK is 


defined as NSsx € (SKp C SK | F € cFacesg ). 


We now define functions mapping a subset of the (geometric) points of an 
NNC polyhedron into the set of supports filled by these points, and vice versa. 


Definition 3 (Filled supports). Let SK be the skeleton of the polyhedron 
P € Pa, Q= d(P) and NS be the corresponding set of supports. The abstraction 
function asx: p(Q) — 91(NS) is defined, for each S C Q, as 


osk(S) € LJ tSKr | ap € S, F € cFaces . p € relint(F) }. 


The concretization function ysk: (1(NS) — Q(Q), for each NS € (NS), is 
defined as 


septs) Uf relint (gen(ns)) | ns € NS " 


Proposition 2. The pair of functions (ask, ysk) is a Galois connection. If 
P= gen((L, R,C, P)) € IP, and SK is its skeleton, then P = (ysx o osi )(P). 


The non-skeleton component of a geometric generator system can be 
abstracted by ‘asx’ and described as a combination of skeleton generators. 


Definition 4 (Non-skeleton of a generator system). Let P € Pn be defined 
by generator system G = (L, R, C, P) and let SK be the corresponding skeleton 


component. The non-skeleton component of G is defined as NSg = ask (P). 


Example 2. Consider the generator system G of polyhedron P from Example 1. 
Its skeleton is SK = (0,0, (co, c1. c2, po}, 0), so that p; is not a skeleton point. By 
Definition 3, NSg = asx ({po, pı }) = t{po} U 1{co, c1}? The minimal elements 
in NSg can be seen to describe the atoms of nncFacesp, i.e., the 0-dimension 
face {po} and the 1-dimension open segment (co, c1). 


The new representation is semantically equivalent to the fully geometric one. 


? Since there are no rays and no lines, we adopt a simplified notation, identifying each 
support with the set of its closure points. Also note that relint({po}) = {po}. 
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Corollary 1. For a polyhedron P = gen(G) € Pn, let (SK, NS) be the skeleton 
and non-skeleton components for G. Then P = ysx(NS). 


4 The New Conversion Algorithm 


The CONVERSION function in Pseudocode 1 incrementally processes each of the 
input constraints € Cin keeping the generator system (SK, NS) up-to-date. 
The distinction between the skeleton and non-skeleton allows for a corresponding 
separation in the conversion procedure. Moreover, a few minor adaptations to 
their representation, discussed below, allow for efficiency improvements. 

First, observe that every support ns € NS always includes all of the lines in 
the L skeleton component; hence, these lines can be left implicit in the repre- 
sentation of the supports in NS. Note that, even after removing the lines, each 
ns € NS is still a non-empty set, since it includes at least one closure point. 

When lines are implicit, those supports ns € NS that happen to be single- 
tons? can be seen to play a special role: they correspond to the combinatorial 
encoding of the skeleton points in SP (see Definition 1). These points are not 
going to benefit from the combinatorial representation, hence we move them from 
the non-skeleton to the skeleton component; namely, SK = (L, R, C U SP, 0) is 
represented as SK — (L, R, C, SP). The formalization presented in Sect. 3 is still 
valid, replacing ‘ysx’ with 4S, (NS) de gen(SK) U ys (NS). 

At the implementation level, each support ns € NS can be encoded by using 
a set of indices on the data structure representing the skeleton component SK. 
Since NS is a finite upward closed set, the representation only needs to record its 
minimal elements. A support ns € NS is redundant in (SK, NS) if there exists 
ns’ € NS such that ns’ C ns or if ns N SP z (), where SK = (L, R, C, SP). We 
write NS; & NS» to denote the non-redundant union of NS, NS C NSsx. 


4.1 Processing the Skeleton 


Line 3 of CONVERSION partitions the skeleton SK into SK*, SK? and SK", 
according to the signs of the scalar products with constraint 8. Note that the 
partition information is logically computed (no copies are performed) and it is 
stored in the SK component itself; therefore, any update to SK*, SK? and 
SK- directly propagates to SK. In line 7 the generators in SK^ and SK" are 
combined to produce SK*, which is merged into SK. These steps are similar to 
the ones for closed polyhedra, except that we now have to consider more kinds of 
combinations: the systematic case analysis is presented in Table 1. For instance, 
when processing a non-strict inequality 3>, if we combine a closure point in BET 
with a ray in SK” we obtain a closure point in SK* (row 3, column 6). Since 
it is restricted to work on the skeleton component, this combination phase can 
safely apply the adjacency tests to quickly get rid of redundant elements. 


3 By ‘singleton’ here we mean a system ns = (0,0, {p}, 0). 
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Pseudocode 1. Incremental conversion from constraints to generators. 
function CONVERSION(Cin, (SK, NS)) 

2: for all 8 € Cin do 

skel partition(8, SK); 
4: nonskel partition( (SXK, NS)); 

if line l € SK* U SKT then vioLATING-LINE(Ó, l, (SK, NS)); 
6: else 

SK* — comb.adj;(SK*, SK); SK? — SK? U SK"; 


8: NS* — MOoVE-NS(B, (SK, NS)); 
NS* — NS* U CREATE-NS(Z, (SK, NS)); 
10: if is equality(8) then (SK, NS) — (SK?, NS? ® NS*); 
else if is.strict ineq(8) then 
12: SK? — points_become_closure_points(SK°); 
(SK, NS) — (SK* USK®, NS* o NS*); 
14: else (SK, NS) — (SKC* U SK®, (NSF U NS?) o NS*); 


PROMOTE-SINGLETONS((SK, NS)); 
16: return (SK, NS); 


Table 1. Case analysis for function ‘comb,’ when adding an equality (=), a non-strict 
y B g y 

(Bs) or a strict (8>) inequality constraint to a pair of generators from SK* and SKT 

(R = ray, C = closure point, SP = skeleton point). 


SK* RIRIR |C(|C|C |SP|SP SP 
SK |R|C SP R C|SPIR |C |SP 
B= or B> SK* R|C SP |C |C |SP |SP |SP | SP 
Bs SK* IR|C|C |C|[C|C IC |C |C 


4.2 Processing the Non-skeleton 


Line 4 partitions the supports in NS by exploiting the partition information for 
the skeleton SK, so that no additional scalar product is computed. Namely, each 
support ns € NS is classified as follows: 

ns € NSt — ns C (SK* U SK?) Ans N SKT £0; 

ns € NS? ——. ns C SK?; 

ns € NST —— ns C (SKT U SK?) A ns N SKT 4 (); 

ns € NS* «— ns N SK* £ ØA ns N SK- FO. 


This partitioning is consistent with the previous one. For instance, if ns € NS, 
then for every possible materialization p € relint(gen(ns)) the scalar product of p 
and f is strictly positive. The supports in NS~ are those whose materializations 
can satisfy, saturate and violate the constraint 8 (ie., the corresponding face 
crosses the constraint hyperplane). 
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In lines 8 and 9, we find the calls to the two main functions processing the 
non-skeleton component. A set NS* of new supports is built as the union of the 
contributes provided by functions MOVE-NS and CREATE-NS. 


Moving Supports. The MOVE-NS function, shown in Pseudocode 2, processes 
the supports in N$-^: this function “moves” the fillers of the faces that are 
crossed by the new constraint, making sure they lie on the correct side. 

Let ns € NS- and F = relint(gen(ns)). Note that ns = SK p before the addi- 
tion of the new constraint 3; at this point, the elements in SK* have been added 
to SK?, but this change still has to be propagated to the non-skeleton compo- 
nent NS. Therefore, we compute the support closure 'supp.clgy-(ns)' according 
to the updated skeleton SK. Intuitively, supp.clgc(ns) C SK is the subset of all 
the skeleton elements that are included in face F. 

At the implementation level, support closures can be efficiently computed by 
exploiting the same saturation information used for the adjacency tests. Namely, 
for constraints C and generators G, we can define 


sat.intere(G) € { 6’ € C | Vg € G : g saturates f?! ), 
sat.interg(C) © {g € G | V8' € C : g saturates f }. 


Then, if C and SK — (L, R, C, SP) are the constraint system and the skeleton 
generator system for the polyhedron, for each ns € NS we can compute [26]: 


supp.clss (ns) = sat.intersy (sat.interc(ns)) \ L. 


Face F is split by constraint 3 into F*, F° and F-. When f is a strict 
inequality, only Ft shall be kept in the polyhedron; when the new constraint is 
a non-strict inequality, both F+ and F? shall be kept. A minimal non-skeleton 
representation for these subsets can be obtained by projecting the support: 


i (ns) def | hs \ SK, if B is a strict inequality; 
ro ns) = 
PPS ns N SK?, otherwise. 


To summarize, by composing support closure and projection in line 3 of 
MOVE-NS, each support in NS~ is moved to the correct side of 8. 


Example 3. Consider P € P» in the left hand side of the next figure. 


The skeleton SK = (0,0, C, 0) contains the closure points in C = (co, 01,02, c3); 
the non-skeleton NS = {ns} contains a single support ns = {co,c3}, which 
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makes sure that the open segment (co, ca) is included in P; the figure shows a 
single materialization for ns. 

When processing 8 = (y < 1), we obtain the polyhedron in the right hand 
side of the figure. In the skeleton phase of the CONVERSION function the adjacent 
skeleton generators are combined: c4 (from co € SK* and c3 € SK") and c5 
(from cı € SK* and cg € SK ) are added to SK?. Since the non-skeleton 
support ns belongs to NS™, it is processed in the MOVE-NS function: 


ns* — proj’ (supp.clsj-(ns)) E proj% (co, c3, c4]) = {co, c4}. 


In contrast, if we were processing the non-strict inequality 8’ = (y € 1), we would 
have obtained ns’ = proj (supp.clgc(ns)) = {c4}. Since ns’ is a singleton, it 
is upgraded to become a skeleton point by procedure PROMOTE-SINGLETONS. 
Hence, in this case the new skeleton is SK = (0, 0, C, SP), where C = {c0, c1, C5} 
and SP = {c4}, while the non-skeleton component is empty. 


Creating New Supports. Consider the case of a support ns € NS violating 
a non-strict inequality constraint 8: this support has to be removed from NS. 
However, the upward closed set NS is represented by its minimal elements only 
so that, by removing ns, we are also implicitly removing other supports from 
the set T ns, including some that do not belong to NS ^ and hence should be 
kept. Therefore, we have to explore the set of faces and detect those that are 
going to lose their filler: their minimal supports will be added to NS*. Similarly, 
when processing a non-strict inequality constraint, we need to consider the new 
faces introduced by the constraint: the corresponding supports can be found by 
projecting on the constraint hyperplane those faces that are possibly filled by 
an element in SP* or NS*. 

'This is the task of the CREATE-NS function, shown in Pseudocode 2. It uses 
ENUMERATE-FACES as a helper:* the latter provides an enumeration of all the 
(higher dimensional) faces that contain the initial support ns. The new faces are 
obtained by adding to ns a new generator g and then composing the support 
closure and projection functions, as done in MOVE-NS. For efficiency purposes, 
a case analysis is performed so as to restrict the search area of the enumeration 
phase, by considering only the faces crossing the constraint. 


Example 4. Consider P € Pə in the left hand side of the next figure, described 
by skeleton SK = (0,0, (co, c1, c2], {p}) and non-skeleton NS = 0. 


4 This enumeration phase is inspired by the algorithm in [26]. 
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Pseudocode 2. Helper functions for moving and creating supports. 


2: 


22: 


24: 


function MOVE-NS(3, (SK, NS)) 


NS* — ý; 
for all ns € NS* do NS* — NS* U f[proj. (supp.cls (ns))); 


return NS*; 


function CREATE-NS(B, (SK, NS)) 


NS* — 0; 
let SK = (L, R, C, SP); 
for all ns c NST U {{p} | p € SP ) do 
NS* — NS* U ENUMERATE-FACES(3, ns, SK+, SK); 
if is_strict_ineq(3) then 
for all ns € NS? U {{p} | p € SP°} do 
NS* — NS* U ENUMERATE-FACES(3, ns, SK*, SK); 
else 
for all ns € NSt U {{p} | p € SP*) do 
NS* — NS* U ENUMERATE-FACES(3, ns, SK , SK); 


return NS*; 


function ENUMERATE-FACES(Ó, ns, SK’, SK) 


NS* — ý; let SK’ = (L’, R',C', SP’): 
for all g € (R' UC") do NS* — NS* U {projé,-(supp.clsc(ns U {g}))}s 


return NS*; 


procedure PROMOTE-SINGLETONS((SK, NS)) 


let SK = (L, R, C, SP); 
for all ns € NS such that ns = (0,0, (c), 0) do 
NS — NS \ {ns}; C — C \ {c}; SP — SP U {e}; 


Pseudocode 3. Processing a line violating constraint f. 


2: 


4: 


10: 


12: 


procedure VIOLATING-LINE(@, l, (SK, NS)) 


split | into rays r” satisfying G and r~ violating 8; 
lrt; 

for all g € SK do g — combg(g, l); 
if is equality(8) then SK — SK°; 


if is_strict_ineq(3) then STRICT-ON-EQ-POINTS(, (SK, NS)); 


procedure STRICT-ON-EQ-POINTS(B, (SK, NS)) 


NS* — f; let SK? = (L9, R9, C9, SP°); 
for all ns € NS° U {{p} | p € SP?) do 
NS* — NS* U ENUMERATE-FACES(3, ns, SK*, SK); 
SK? — points-become-closure-points(SK°); 
(SK, NS) — (SK * U SK?, NS* e NS*); 
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Come —————— 6 C1 C1 


'The partition for SK induced by the non-strict inequality is as follows: 
SK* = (0,0,0,{p}), SK? = (0,0,(c0,c2),0), SKT = (0,0, {er}, 0). 


There are no adjacent generators in SK* and SK”, so that SK* is empty. 
When processing the non-skeleton component, the skeleton point in SK* will be 
considered in line 15 of function CREATE-NS. The corresponding call to function 
ENUMERATE-FACES computes 


ns* = projec (supp.clsic({p} U tea) = proj (fco, c1, c2, p]) = ico, c2}, 


thereby producing the filler for the open segment (co, c2). The resulting polyhe- 
dron, shown in the right hand side of the figure, is thus described by the skeleton 
SK = (0,0, (co, c2}, {p}) and the non-skeleton NS = {ns*}. 

It is worth noting that, when handling Example 4 adopting an entirely geo- 
metric representation, closure point c; needs to be combined with point p even if 
the two generators are not adjacent: this leads to a significant efficiency penalty. 
Similarly an implementation based on the e-representation will have to com- 
bine closure point cı with point p (and/or with some other e-redundant points), 
because the addition of the slack variable makes them adjacent. Therefore, an 
implementation based on the new approach obtains a twofold benefit: first, the 
distinction between skeleton and non-skeleton allows for restricting the handling 
of non-adjacent combinations to the non-skeleton phase; second, thanks to the 
combinatorial representation, the non-skeleton component can be processed by 
using set index operations only, i.e., computing no linear combination at all. 


Preparing for Next Iteration. In lines 10 to 15 of CONVERSION the generator 
system is updated for the next iteration. The new supports in NS* are merged 
(using *&' to remove redundancies) into the appropriate portions of the non- 
skeleton component. In particular, when processing a strict inequality, in line 12 
the helper function 


points become. closure. points((L, R, C, SP)) f (y, R.C U SP, 0) 
is applied to SK°, making sure that all of the skeleton points saturating /j are 
transformed into closure points having the same position. The final processing 
step (line 15) calls helper procedure PROMOTE-SINGLETONS (see Pseudocode 2), 
making sure that all singleton supports get promoted to skeleton points. 

Note that line 5 of CONVERSION, by calling procedure VIOLATING-LINE (see 
Pseudocode 3) handles the special case of a line violating 3. This is just an opti- 
mization: the helper procedure STRICT-ON-EQ-POINTS can be seen as a tailored 
version of CREATE-NS, also including the final updating of SK and NS. 
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4.3 Duality 


The definitions given in Sect. 3 for a geometric generator system have their dual 
versions working on a geometric constraint system. We provide a brief overview 
of these correspondences, which are summarized in Table 2. 


Table 2. Correspondences between generator and constraint concepts. 


Generators Constraints 


Geometric skeleton 


singular line equality 


non-singular | ray or closure point | non-strict inequality 
semantics |gen(SK)-0 con(SK) = cl(P) 


Combinatorial non-skeleton 


abstracts point strict inequality 


element role | face filler face cutter 


represents upward closed set | downward closed set 


encoding minimal support minimal support 


singleton skeleton point skeleton strict inequality 


For a non-empty P = con(C) € Pn, the skeleton of C = (C=, C», C>) includes 
the non-redundant constraints defining Q = cl(P). Denoting by SC’, the skeleton 


strict inequalities (i.e., those whose corresponding non-strict inequality is not 


redundant for Q), we have SKg & (C_,Cs U SCs,0), so that Q = con(SK 9). 


The ghost faces of P are the faces of the closure Q that do not intersect P: 
gFacesp E {F € cFacesg | F AN P —0Y; thus, P = con(SKg) \ UgFacesp. 


The set gFaces' def gFaces U (Q) is a meet sublattice of cFaces; also, gFaces is 
downward closed and can be represented by its maximal elements. 

The skeleton support SK r of a face F € cFacesg is defined as the set of 
all the skeleton constraints that are saturated by all the points in F. Each face 
F € gFaces saturates a strict inequality 6> € Cys: we can represent such a 
face using its skeleton support SK p of which £8. is a possible materialization. 
A constraint system non-skeleton component NS C NS is thus a combinatorial 
representation of the strict inequalities of the polyhedron. 

Hence, the non-skeleton components for generators and constraints have a 
complementary role: in the case of generators they are face fillers, marking the 
minimal faces that are included in nncFaces; in the case of constraints they are 
face cutters, marking the maximal faces that are ezcluded from nncFaces. Note 
that the non-redundant cutters in gFaces are those having a minimal skeleton 
support, as is the case for the fillers. 

As it happens with lines, all the equalities in C= are included in all the 
supports ns € NS so that, for efficiency, they are not represented explicitly. 
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After removing the equalities, a singleton ns € NS stands for a skeleton strict 
inequality constraint, which is better represented in the skeleton component, 
thereby obtaining SK = (C=, C>, SCs). Hence, a support ns € NS is redundant 
if there exists ns’ € NS such that ns’ C ns or if ns N SCs z (j. 

When the concepts underlying the skeleton and non-skeleton representation 
are reinterpreted as discussed above, it is possible to define a conversion proce- 
dure mapping a generator representation into a constraint representation which 
is very similar to the one from constraints to generators. 


5 Experimental Evaluation 


The new representation and conversion algorithms for NNC polyhedra have been 
implemented and tested in the context of the PPL (Parma Polyhedra Library). 
A full integration in the PPL domain of NNC polyhedra is not possible, since the 
latter assumes the presence of the slack variable e. The approach, summarized by 
the diagram in Fig. 1, is to intercept each call to the PPL’s conversion (working 
on e-representations in CPP, 1) and pair it with a corresponding call to the new 
algorithm (working on the new representations in P,,). 


e-repr Cin (resp., Gin) —— e-less encoding ) 4 e-less Cj, (resp., Gin) 


Y 
( old conversion ) ( new conversion } 
Y Y 
e-repr DD skel/non-skel DD 


"EM correctness check p / 


Fig. 1. High level diagram for the experimental evaluation (non-incremental case). 


On the left hand side of the diagram we see the application of the standard 
PPL conversion procedure: the input e-representation is processed by ‘old conver- 
sion’ so as to produce the output e-representation DD pair. The ‘e-less encoding? 
phase produces a copy of the input without the slack variable; this is processed by 
'new conversion' to produce the output DD pair, based on the new skeleton/non- 
skeleton representation. After the two conversions are completed, the outputs are 
checked for both semantic equivalence and non-redundancy. This final checking 
phase was successful on all the experiments performed, which include all of the 
tests in the PPL. In order to assess efficiency, additional code was added to mea- 
sure the time spent inside the old and new conversion procedures, disregarding 
the input encoding and output checking phases. It is worth stressing that sev- 
eral experimental evaluations, including recent ones [2], confirm that the PPL 
is a state-of-the-art implementation of the DD method for a wide spectrum of 
application contexts. 
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The first experiment? on efficiency is meant to evaluate the overhead incurred 
by the new representation and algorithm for NNC polyhedra when processing 
topologically closed polyhedra, so as to compare it with the corresponding over- 
head incurred by the «representation. To this end, we considered the pp1.1cdd 
demo application of the PPL, which solves the vertez/facet enumeration problem. 
In Table 3 we report the results obtained on a selection of the test benchmarks? 
when using: the conversion algorithm for closed polyhedra (columns 2-3); the 
conversion algorithm for the e-representation of NNC polyhedra (columns 4—5); 
and the new conversion algorithm for the new representation of NNC polyhedra 
(columns 6-7). Columns ‘time’ report the number of milliseconds spent; columns 
‘sat’ report the number of saturation (i.e., bit vector) operations, in millions. 

The results in Table3 show that the use of the e-representation for closed 
polyhedra incurs a significant overhead. In contrast, the new representation and 
algorithm go beyond all expectations: in almost all of the tests there is no over- 
head at all (that is, any overhead incurred is so small to be masked by the 
improvements obtained in other parts of the algorithm). 


Table 3. Overhead of conversion for C polyhedra. Units: time (ms), sat (M). 


test closed poly €-repr (SK, NS) 
time sat| time sat| time sat 
cp6.ext 21 1.1 AT 5.3) 18 1.1 
cross12.ine 157 | 17.1} 215 18.1, 180 | 17.2 
in7.ine AT 1.7| 149 6.1| 27 0.9 
kkd38_6.ine 498 | 28.3) 1870 | 113.2) 218 | 14.2 


kq20_11_m.ine 42 1.7) 153 6.1 27 0.9 
metric80_16.ine 39 2.3 76 5.4 25 2.0 
mit31-20.ine 1109 | 88.7 | 35629 | 702.2, 816 | 60.1 
mp6.ine 86 6.4 215 17.9 72 8.0 
reg600-5_m.ext | 906 | 24.7) 3062 | 119.1, 723 | 14.0 
sampleh8.ine 5916 | 307.4 | 42339 | 1420.7 | 3309 | 154.1 
truncl0.ine 1274 | 91.7) 5212 | 396.6| 803 | 89.9 


'The second experiment is meant to evaluate the efficiency gains obtained 
in a more appropriate context, i.e., when processing polyhedra that are not 
topologically closed. To this end, we consider the same benchmark discussed 
in [3, Table2]," which highlights the efficiency improvement resulting from the 
adoption of an enhanced evaluation strategy (where a knowledgeable user of the 


5 All experiments have been performed on a laptop with an Intel Core i7-3632QM 
CPU, 16 GB of RAM and running GNU/Linux 4.13.0-25. 

5 We only show the tests where PPL time on closed polyhedra is above 20 ms. 

T The test dualhypercubes.cc is distributed with the source code of the PPL. 
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library explicitly invokes, when appropriate, the strong minimization procedures 
for e-representations) with respect to the standard evaluation strategy (where 
the user simply performs the required computation, leaving the burden of opti- 
mization to the library developers). In Table4 we report the results obtained 
for the most expensive test among those described in [3, Table2], comparing 
the standard and enhanced evaluation strategies for the e-representation (rows 
1 and 2) with the new algorithm (row 3). For each algorithm we show in column 
2 the total number of iterations of the conversion procedures and, in the next 
two columns, the median and maximum sizes of the representations computed 
at each iteration (i.e., the size of the intermediate results); in columns from 5 to 
8 we show the numbers of incremental and non-incremental calls to the conver- 
sion procedures, together with the corresponding time spent (in milliseconds); 
in column 9 we show the time spent in strong minimization of e-representations; 
in the final column, we show the overall time ratio, computed with respect to 
the time spent by the new algorithm. 


Table 4. Comparing e-representation based (standard and enhanced) computations 
for NNC polyhedra with the new conversion procedures. 


algorithm # iter| iter sizes full conv | incr conv |e-min,| time 

median, max | num | time | num time| time| ratio 
e-repr standard 1142 3706 | 7259 4 11 3 | 30336 27 | 1460.9 
e-repr enhanced 525 109 | 1661 7| 204 0 = 29 11.2 
(SK, NS) standard 314 62, 180 4 6 3 15 = 1.0 


Even though adopting the standard computation strategy (requiring no clever 
guess by the end user), the new algorithm obtains impressive time improvements, 
outperforming not only the standard, but also the enhanced computation strat- 
egy for the e-representation. The reason for the latter efficiency improvement is 
that the enhanced computation strategy, when invoking the strong minimization 
procedures, interferes with incrementality: the figures in Table 4 confirm that the 
new algorithm performs three of the seven required conversions in an incremental 
way, while in the enhanced case they are all non-incremental. Moreover, a com- 
parison of the iteration counts and the sizes of the intermediate results provides 
further evidence that the new algorithm is able to maintain a non-redundant 
description even during the iterations of a conversion. 


6 Conclusion 


We have presented a new approach for the representation of NNC polyhedra in 
the Double Description framework, avoiding the use of slack variables and distin- 
guishing between the skeleton component, encoded geometrically, and the non- 
skeleton component, provided with a combinatorial encoding. We have proposed 
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and implemented a variant of the Chernikova conversion procedure achieving 
significant efficiency improvements with respect to a state-of-the-art implemen- 
tation of the domain of NNC polyhedra, thereby providing a solution to all 
the issues affecting the representation approach. As future work, we plan to 
develop a full implementation of the domain of NNC polyhedra based on this new 
representation. To this end, we will have to reconsider each semantic operator 
already implemented by the existing libraries (which are based on the addition 
of a slack variable), so as to propose, implement and experimentally evaluate a 
corresponding correct specification based on the new approach. 
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Abstract. Given a relational specification between Boolean inputs and 
outputs, the goal of Boolean functional synthesis is to synthesize each 
output as a function of the inputs such that the specification is met. In 
this paper, we first show that unless some hard conjectures in complex- 
ity theory are falsified, Boolean functional synthesis must generate large 
Skolem functions in the worst-case. Given this inherent hardness, what 
does one do to solve the problem? We present a two-phase algorithm, 
where the first phase is efficient both in terms of time and size of synthe- 
sized functions, and solves a large fraction of benchmarks. To explain this 
surprisingly good performance, we provide a sufficient condition under 
which the first phase must produce correct answers. When this condition 
fails, the second phase builds upon the result of the first phase, possibly 
requiring exponential time and generating exponential-sized functions in 
the worst-case. Detailed experimental evaluation shows our algorithm to 
perform better than other techniques for a large number of benchmarks. 


Keywords: Skolem functions - Synthesis - SAT solvers 
CEGAR based approach 


1 Introduction 


The algorithmic synthesis of Boolean functions satisfying relational specifica- 
tions has long been of interest to logicians and computer scientists. Informally, 
given a Boolean relation between input and outupt variables denoting the spec- 
ification, our goal is to synthesize each output as a function of the inputs such 
that the relational specification is satisfied. Such functions have also been called 
Skolem functions in the literature [23,29]. Boole [8] and Lowenheim [27] studied 
variants of this problem in the context of finding most general unifiers. While 
these studies are theoretically elegant, implementations of the underlying tech- 
niques have been found to scale poorly beyond small problem instances [28]. 
More recently, synthesis of Boolean functions has found important applications 
in a wide range of contexts including reactive strategy synthesis [4, 19, 40], certi- 
fied QBF-SAT solving [7,21,31,34], automated program synthesis [35,37], circuit 
© The Author(s) 2018 
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repair and debugging [22], disjunctive decomposition of symbolic transition rela- 
tions [39] and the like. This has spurred recent interest in developing practically 
efficient Boolean function synthesis algorithms. The resulting new generation 
of tools [3, 17,23, 29, 33, 34,38] have enabled synthesis of Boolean functions from 
much larger and more complex relational specifications than those that could be 
handled by earlier techniques, viz. [20, 21, 28]. 

In this paper, we re-examine the Boolean functional synthesis problem from 
both theoretical and practical perspectives. Our investigation shows that unless 
some hard conjectures in complexity theory are falsified, Boolean functional 
synthesis must necessarily generate super-polynomial sized Skolem functions, 
thereby requiring super-polynomial time, in the worst-case. Therefore, it is 
unlikely that an efficient algorithm exists for solving all instances of Boolean 
functional synthesis. There are two ways to address this hardness in practice: (i) 
design algorithms that are provably efficient but may give “approximate” Skolem 
functions that are correct on only a fraction of all possible input assignments, 
or (ii) design a phased algorithm, wherein the initial phase(s) is/are provably 
efficient and solve a subset of problem instances, and subsequent phase(s) have 
worst-case exponential behaviour and solve all remaining problem instances. In 
this paper, we combine the two approaches while giving heavy emphasis on effi- 
cient instances. We also provide a sufficient condition for our algorithm to be 
efficient, which indeed is borne out by our experiments. 

The primary contributions of this paper can be summarized as follows. 


1. We start by showing that unless P = NP, there exist problem instances where 
Boolean functional synthesis must take super-polynomial time. Moreover, if 
the non-uniform exponential time hypothesis [14] holds, there exist problem 
instances where Boolean functional synthesis must generate exponential sized 
Skolem functions, thereby also requiring at least exponential time. 

2. We present a new two-phase algorithm for Boolean functional synthesis. 

(a) Phase 1 of our algorithm generates candidate Skolem functions of size 
polynomial in the input specification. This phase makes polynomially 
many calls to an NP oracle (SAT solver in practice). Hence it directly 
benefits from the progess made by the SAT solving community, and is 
efficient in practice. Our experiments indicate that Phase 1 suffices to 
solve a large majority of publicly available benchmarks. 

(b) However, there are indeed cases where the first phase is not enough (our 
theoretical results imply that such cases likely exist). In such cases, the 
first phase provides good candidate Skolem functions as starting points 
for the second phase. Phase 2 of our algorithm starts from these candi- 
date Skolem functions, and uses a CEGAR-based approach to produce 
correct Skolem functions whose size may indeed be exponential in the 
input specification. 

3. We analyze the surprisingly good performance of the first phase (especially 
in light of the theoretical hardness results) and show a sufficient condition 
on the structure of the input representation that guarantees correctness of 
the first phase. Interestingly, popular representations like ROBDDs [11] give 
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rise to input structures that satisfy this condition. The goodness of Skolem 
functions generated in this phase of the algorithm can also be quantified 
with high confidence by invoking an approximate model counter [13], whose 
complexity lies in BPP™P. 

4. We conduct an extensive set of experiments over a variety of benchmarks, 
and show that our algorithm performs favourably vis-a-vis state-of-the-art 
algorithms for Boolean functional synthesis. 


Related Work. The literature contains several early theoretical studies on vari- 
ants of Boolean functional synthesis [6,8,9,16,27,30]. More recently, researchers 
have tried to build practically efficient synthesis tools that scale to medium 
or large problem instances. In [29], Skolem functions for X are extracted from 
a proof of validity of VY JX F(X, Y). Unfortunately, this doesn’t work when 
VYJX F(X, Y) is not valid, despite this class of problems being important, as 
discussed in [3,17]. Inspired by the spectacular effectiveness of CDCL-based SAT 
solvers, an incremental determinization technique for Skolem function synthesis 
was proposed in [33]. In [20,39], a synthesis approach based on iterated compo- 
sitions was proposed. Unfortunately, as has been noted in [17,23], this does not 
scale to large benchmarks. A recent work [17] adapts the composition-based app- 
roach to work with ROBDDs. For factored specifications, ideas from symbolic 
model checking using implicitly conjoined ROBDDs have been used to enhance 
the scalability of the technique further in [38]. In the genre of CEGAR-based 
techniques, [23] showed how CEGAR can be used to synthesize Skolem func- 
tions from factored specifications. Subsequently, a compositional and parallel 
technique for Skolem function synthesis from arbitrary specifications represented 
using AIGs was presented in [3]. The second phase of our algorithm builds on 
some of this work. In addition to the above techniques, template-based [37] or 
sketch-based [36] approaches have been found to be effective for synthesis when 
we have information about the set of candidate solutions. A framework for func- 
tional synthesis that reasons about some unbounded domains such as integer 
arithmetic, was proposed in [25]. 


2 Notations and Problem Statement 


A Boolean formula F(zi,...25) on p variables is a mapping F : (0,1)? — {0,1}. 
The set of variables {z1,... Zp} is called the support of the formula, and denoted 
sup(F). A literal is either a variable or its complement. We use F|.,.o (resp. 
F'|,,=1) to denote the positive (resp. negative) cofactor of F with respect to zj. 
A satisfying assignment or model of F is a mapping of variables in sup(F) to 
{0,1} such that F evaluates to 1 under this assignment. If 7 is a model of F, 
we write 7 = F and use m(z;) to denote the value assigned to z; € sup(F) by 
7. Let Z = (2i,, Ziz,- -< 2%, ) be a sequence of variables in sup(F). We use 7|Z to 
denote the projection of m on Z, i.e. the sequence (7(z;,),(Ziz),---7(2i;))- 

A Boolean formula is in negation normal form (NNF) if (i) the only operators 
used in the formula are conjunction (A), disjunction (V) and negation (~), and 
(ii) negation is applied only to variables. Every Boolean formula can be converted 
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to a semantically equivalent formula in NNF. We assume an NNF formula is 
represented by a rooted directed acyclic graph (DAG), where internal nodes are 
labeled by ^ and V, and leaves are labeled by literals. In this paper, we use 
AIGs [24] as the initial representation of specifications. Given an AIG with t 
nodes, an equivalent NNF formula of size O(t) can be constructed in O(t) time. 
We use |F| to denote the number of nodes in a DAG representation of F. 

Let o be the subformula represented by an internal node N (labeled by ^ or 
V) in a DAG representation of an NNF formula. We use lits(@) to denote the 
set of literals labeling leaves that have a path to the node N representing a in 
the DAG. A formula is said to be in weak decomposable NNF, or wONNF, if it 
is in NNF and if for every ^-labeled node in the DAG, the following holds: let 
a = 04 A... A ax be the subformula represented by the internal node. Then, 
there is no literal / and distinct indices i, j € (1,... k} such that l € lits(a;) and 
al € lits(a;). Note that wDNNF is a weaker structural requirement on the NNF 
representation vis-a-vis the well-studied DNNF representation, which has elegant 
properties [15]. Specifically, every DNNF formula is also à wDNNF formula. 

We say a literal l is pure in F iff the NNF representation of F has a leaf 
labeled l, but no leaf labeled ^l. F is said to be positive unate in z; € sup(F) 
iff F|,.o > F|.-i. Similarly, F is said to be negative unate in z; iff F|,,.1 > 
F|.,-o. Finally, F is unate in z; if F is either positive unate or negative unate 
in z;. A function that is not unate in z; € sup(F) is said to be binate in zi. 

We also use X = (21,...2n) to denote a sequence of Boolean outputs, and 
Y = (y1,---Ym) to denote a sequence of Boolean inputs. The Boolean func- 
tional synthesis problem, henceforth denoted BFnS, asks: given a Boolean for- 
mula F(X, Y) specifying a relation between inputs Y = (y1,... Ym) and out- 
puts X = (z1,... 2n), determine functions YW = (v1(Y),...«,(Y)) such that 
F(X, Y) holds whenever IX F(X, Y) holds. Thus, VY (X F(X, Y)) & F(v, Y) 
must be rendered valid. The function v; is called a Skolem function for x; in F, 
and V = (¢1,...Wn) is called a Skolem function vector for X in F. 

For 1 <i € j € n, let X? denote the subsequence (z;,2;,1,...2;) and let 
FG-U(X7, Y) denote 3X? ! F(X1 t, X7, Y). It has been argued in [3, 17,20, 23] 
that given a relational specification F(X, Y), the BFnS problem can be solved 
by first ordering the outputs, say as x4 < z2::: < £n, and then synthe- 
sizing a function v;(X?,,, Y) for each z; such that FP (y; X7^,,, Y) © 
su pu Us X71, Y). Once all such v; are obtained, one can substitute Wj41 
through wv, for z;,1 through £n respectively, in v»; to obtain a Skolem function 
for x; as a function of only Y. We adopt this approach, and therefore focus on 
obtaining y; in terms of X7, , and Y. Furthermore, we know from [20,23] that a 
function v; is a Skolem function for x; iff it satisfies Ae > a > ul, "d , where 
Aj XD PO 0, X24, Y) and 1 = XE OP ea Y) 
When F is clear from the context, we often omit it and write A; and 17. It is 
easy to see that both A; and =I; serve as Skolem functions for x; in F. 


What’s Hard About Boolean Functional Synthesis? 255 


3 Complexity-Theoretical Limits 


In this section, we investigate the computational complexity of BFnS. It is easy 
to see that BFnS can be solved in EXPTIME. Indeed a naive solution would be 
to enumerate all possible values of inputs Y and invoke a SAT solver to find 
values of X corresponding to each valuation of Y that makes F(X,Y) true. 
This requires worst-case time exponential in the number of inputs and outputs, 
and may produce an exponential-sized circuit. Given this, one can ask if we can 
develop a better algorithm that works faster and synthesizes “small” Skolem 
functions in all cases? Our first result shows that existence of such small Skolem 
functions would violate hard complexity-theoretic conjectures. 


Theorem 1. 1. Unless P = NP, there exist problem instances where any algo- 
rithm for BFnS must take super-polynomial time! . 

2. Unless the non-uniform exponential-time hypothesis (or ETHnu) fails, there 
exist problem instances where any algorithm for BFnS must generate Skolem 
functions of size exponential in the input size. 


A consequence of the second statement is that, under the same hypothesis, there 
must exist an instance of BFnS for which any algorithm must take EXPTIME 
time. The exponential-time hypothesis ETH and its strengthened version, the 
non-uniform exponential-time hypothesis ETH,,, are unproven computational 
hardness assumptions (see [14,18]), which have been used to show that several 
classical decision, functional and parametrized NP-complete problems (such as 
p-Clique) are unlikely to have sub-exponential algorithms. ETH,y states that 
there is no family of algorithms (one for each family of inputs of size n) that can 
solve 3-SAT in subexponential time. In [14] it is shown that if ETH,, holds, then 
p-Clique, the parametrized clique problem, cannot be solved in sub-exponential 
time, i.e., for all d € N, and sufficiently large fixed k, determining whether a 
graph G has a clique of size k is not in DTIME(n?). 


Proof. We describe a reduction from p-Clique to BFnS. Given an undirected 
graph G = (V, E) on n-vertices and a number k (encoded in binary), we want 
to check if G has a clique of size k. We encode the graph as follows: each vertex 
v € V is identified by a unique number in {1,...n}, and for every (i,j) € V x V, 
we introduce an input variable y;;; that is set to 1 iff (i,j) € E. We call the 
resulting vector of input variables y. We also have additional input variables 
Z = 21,.--%m, Which represent the binary encoding of k (m = [log, k]). Finally, 
we introduce output variables x, for each v € V, whose values determine which 
vertices are present in the clique. Let æ denote the vector of x, variables. 
Given inputs Y = (y, z], and outputs X = {a}, our specification is repre- 
sented by a circuit F over X, Y that verifies whether the vertices encoded by X 
indeed form a k-clique of the graph G. The circuit F is constructed as follows: 


1 Since the submission of this paper, we have obtained a sharper complexity result. 
Details of this can be found in [2]. 
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1. For every 7,7 such that 1 < à < j € n, we construct a sub-circuit implement- 
ing z; Ax; > yi, j- The outputs of all such subcircuits are conjoined to give an 
intermediate output, say EdgesOK. Clearly, all the subcircuits taken together 
have size O(n?). 

2. We have a tree of binary adders implementing z; + x2 + ... £n. Let the 
flog, n]-bit output of the adder be denoted CliqueSz. The size of this adder 
is clearly O(n). 

3. We have an equality checker that checks if CliqueSz = k. Clearly, this sub- 
circuit has size [logan]. Let the output of this equality checker be called 
SizeOK. 

4. The output of the specification circuit F is EdgesOK A SizeOK. 


Given an instance Y = {y, z] of p-Clique, we now consider the specification 
F(X,Y) as constructed above and feed it as input to any algorithm A for solving 
BFnS. Let V be the Skolem function vector output by A. For each i € {1,...n}, 
we now feed wv; to the input y; of the circuit F. This effectively constructs a 
circuit for F(W,Y). It is easy to see from the definition of Skolem functions 
that for every valuation of Y, the function F(W, Y) evaluates to 1 iff the graph 
encoded by Y contains a clique of size k. 

Using this reduction, we can complete the proofs of both our statements: 


1. If the circuits for the Skolem functions W are super-polynomial sized, then 
of course any algorithm generating VW must take super-polynomial time. On 
the other hand, if the circuits for the Skolem functions V are always poly- 
sized, then F(XV, Y) is polynomial-sized, and evaluating it takes time that is 
polynomial in the input size. Thus, if A is a polynomial-time algorithm, we 
also get an algorithm for solving p-Clique in polynomial time, which implies 
that P — NP. 

2. If the circuits for the Skolem functions W are sub-exponential sized in the 
input n, then F(XV, Y) is also sub-exponential sized and can be evaluated 
in sub-exponential time. It then follows that we can solve any instance p- 
Clique of input length n in sub-exponential time — a violation of ETHnu. Note 
that since our circuits can be different for different input lengths, we may 
have different algorithms for different n. Hence we have to appeal to the 
non-uniform variant of ETH. 


Theorem 1 implies that efficient algorithms for BFnS are unlikely. We there- 
fore propose a two-phase algorithm to solve BFnS in practice. The first phase 
runs in polynomial time relative to an NP-oracle and generates polynomial- 
sized “approximate” Skolem functions. We show that under certain structural 
restrictions on the NNF representation of F, the first phase always returns exact 
Skolem functions. However, these structural restrictions may not always be met. 
An NP-oracle can be used to check if the functions computed by the first phase 
are indeed exact Skolem functions. In case they aren't, we proceed to the second 
phase of our algorithm that runs in worst-case exponential time. Below, we dis- 
cuss the first phase in detail. The second phase is an adaptation of an existing 
CEGA R-based technique and is described briefly later. 
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4 Phase 1: Efficient Polynomial-Sized Synthesis 


An easy consequence of the definition of unateness is the following. 


Proposition 1. If F(X, Y) is positive (resp. negative) unate in xi, then v; = 1 
(resp. Yi = 0) is a correct Skolem function for xi. 


All omitted proofs, including that of the above, may be found in [2]. The above 
result gives us a way to identify outputs x; for which a Skolem function can 
be easily computed. Note that if x; (resp. —^z;) is a pure literal in F, then F 
is positive (resp. negative) unate in z;. However, the converse is not necessarily 
true. In general, a semantic check is necessary for unateness. In fact, it follows 
from the definition of unateness that F is positive (resp. negative) unate in z;, 
iff the formula n7 (resp. 7; ) defined below is unsatisfiable. 


n? = poc XS ^ SPORTS X a Y. (1) 
Ni = FUIT LUKE. Y) AcE(XI 5,0,X9 4, Y). (2) 


Note that each such check involves a single invocation of an NP-oracle, and a 
variant of this method is described in [5]. 

If F is binate in an output z;, Proposition 1 doesn't help in synthesizing Yi. 
Towards synthesizing Skolem functions for such outputs, recall the definitions 
of A; and T; from Sect. 2. Clearly, if we can compute these functions, we can 
solve BFnS. While computing A; and I; exactly for all x; is unlikely to be effi- 
cient in general (in light of Theorem 1), we show that polynomial-sized *good" 
approximations of A; and I; can be computed efficiently. As our experiments 
show, these approximations are good enough to solve BFnS for several bench- 
marks. Furthermore, with access to an NP-oracle, we can also check when these 
approximations are indeed good enough. " 

Given a relational specification F(X, Y), we use F(X, X, Y) to denote the 
formula obtained by first converting F to NNF, and then replacing every occur- 
rence of ^x; (x; € X) in the NNF formula with a fresh variable z;. As an 
example, suppose F(X,Y) = (zı V ^(xa V y1)) V ^(z» V a(y2 ^ ^y1)). Then 
F(X,X,Y) = (x1 V (T2 A 7y1)) V (T2 ^A yo ^ ^y1). The following are easy to see. 


Proposition 2. (a) F(X,X, Y) is positive unate in both X and X. 
(b) Let 4X denote (741,...7%). Then F(X,Y) & F(X,7X, Y). 


For every i € {1,...n}, we can split X = (x1,... £n) into two parts, Xi and 
X7, ,, and represent F(X, X, Y) as F(Xi, X?,,, X3, X7,,, Y). We use these rep- 
resentations of F interchangeably, depending on the context. For b,c € {0,1}, 
let b? (resp. c^) denote a vector of i b’s (resp. c's). For notational convenience, we 
use F(bi, Xir eX Y) to denote F(Xi, X^ Xo Xin VY) xi pi Xt =c 
the subsequent discussion. The following is an easy consequence of Proposition 2. 


in 


Proposition 3. For every i € {1,...n}, the following holds: 
F(0, X} 1,0, 9X2, Y) > J3XiF(X,Y) > F(I,X541,;-X?4, Y) 
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Proposition 3 allows us to bound A; and T; as follows. 
Lemma 1. For every x; € X, we have: 


(a) 2F(r710, X7, 1',2X2,,,Y) => Ai => -F(0*, X7,,,071,-X2,, Y) 
(b) aF(1*, X^ 0, 7X71, Y) >l, > ~F (0711, X}, 0’, 7X1, Y) 


In the remainder of the paper, we only use under-approximations of A; and 1, 
and use 6; and y; respectively, to denote them. Recall from Sect. 2 that both A; 
and ~I; suffice as Skolem functions for z;. Therefore, we propose to use either 
6; or ^y; (depending on which has a smaller AIG) obtained from Lemma 1 as 
our approximation of pi. Specifically, 


i; PIT XT UE Xa Y), y = P1, XS. 1710, =X}, Y) 
WPi = ĝi or ^*;, depending on which has a smaller AIG (3) 


Example 1. Consider the specification X = Y, expressed in NNF as F(X, Y) = 
Neca (Gri A yi) V (^2; ^ ^yi)). As noted in [33], this is a difficult example for 
CEGAR-based QBF solvers, when n is large. 

From Eq.3, à; = ^(^y ^ Njai (8 9 vi) ^w V Vo aig (27 € vj). 
and 4; = (yi A Aj uuu (ty @ yj) = ^wi V Vjzi1(£ © ~y). With ô; as 
the choice of p;i, we obtain Y; = yi V. Vj i41 (aj € —y;). Clearly, Yn = yn. On 
reverse-substituting, we get 4.1 = Yn-1 V (Wn € yn) = yn-1 V0 = Yn-1. 
Continuing in this way, we get v; = y; for all i € {1,...n}. The same result 
is obtained regardless of whether we choose 6; or ~q; for each w;. Thus, our 
approximation is good enough to solve this problem. In fact, it can be shown 
that 6; = A; and y = T; for all i € {1,...n} in this example. 


Note that the approximations of Skolem functions, as given in Eq. (3), are 
efficiently computable for all i € {1,...n}, as they involve evaluating F with 
a subset of inputs set to constants. This takes no more than O(|F|) time and 
space. As illustrated by Examplel, these approximations also often suffice to 
solve BFnS. The following lemma partially explains this. 


Theorem 2. (a) For i € {1,...n}, suppose the following holds: 
Vj e Tet) FULXTa15 X. uu Y) > F(11710, X? 0 5X4 Y) 
V FUE X? 1,11710, Xj} Y) 
Then IX F(X,Y) © F(15, X? ,, 15, =X}, Y). 


(b) If F(X,-X, Y) is in wDNNF, then 6; = A; and yi = T; for every i € 
{1,...n}. 


Proof. To prove part (a), we use induction on i. The base case corresponds to i = 
1. Recall that JX1F(X, Y) © F(1,X2, 0, 2X2, Y)V F(0, X2,1, 7X4, Y) by def- 
inition. Proposition 3 already asserts that JXIF(X,Y) > F(1,X2, 1, 2X2,Y). 
Therefore, if the condition in Theorem2(a) holds for i — 1, we then have 


What’s Hard About Boolean Functional Synthesis? 259 


F,X2,1,5X2, Y) & F(1,X2,0, 2X2, Y)VF(0, X2, 1, X2, Y), which in turn 
is equivalent to JX1F(X, Y). This proves the base case. 

Let us now assume (inductive hypothesis) that the statement of Theorem 2(a) 
holds for 1 € i < n. We prove below that the same statement holds for i + 
1 as well Clearly, JX1* F(X,Y) & 3zj 41 (3XiF(X, Y)). By the inductive 
hypothesis, this is equivalent to Jri POS XI, 15, 2X7, ,, Y). By definition 
of existential quantification, this is equivalent to F(1**!, X? 2, 10, =X% 4, Y) V 
F(1:0, X? 2, 111, =X? 2, Y). From the condition in Theorem 2(a), we also have 


PUO a X; x) > F10, X22111, Xia Y) 
VE eg 1'0, X; Y) 


The implication in the reverse direction follows from Proposition 2(a). Thus 
we have a bi-implication above, which we have already seen is equivalent to 


IXİH! F(X, Y). This proves the inductive case. 

To prove part (b), we first show that if F(X, =X, Y) is in wDNNF, then the 
condition in Theorem 2(a) must hold for all j € {1,...n}. Theorem 2(b) then 
follows from the definitions of A; and I; (see Sect.2), from the statement of 
Theorem 2(a) and from the definitions of ô; and q; (see Eq. 3). 

For j € {1,...n}, let G(X? 1, X5,,, Y) denote the formula F(17, X?,,, 


EX SY) A ^ (Pa570,x5,,, 1/71, X5, Y) VPI, XA, 1710, 
Xvi Y)). Suppose, if possible, F(X,-X,Y) is in wDNNF but there exists 


j (1 <j < n) such that CKY) is satisfiable. Let X? , = Ø, Xe =e 
and Y = @ be asatisfying assignment of ¢. We now consider the simplified circuit 


obtained by substituting 17—! for XİT? as well as for x, o for X54, & for 
X; 41 and @ for Y in the AIG for F. This simplification replaces the output of 
every internal node with a constant (0 or 1), if the node evaluates to a constant 
under the above assignment. Note that the resulting circuit can have only 2; 
and z; as its inputs. Furthermore, since the assignment satisfies Ç, it follows 
that the simplified circuit evaluates to 1 if both x; and z; are set to 1, and it 
evaluates to 0 if any one of x; or T; is set to 0. This can only happen if there 
is a node labeled ^ in the AIG representing F (X, 2X, Y) with a path leading 
from the leaf labeled xj, and another path leading from the leaf labeled —z;. 
This is a contradiction, since F(X, =X, Y) is in wDNNF. Therefore, there is no 
j € (1,...n) such that the condition of Theorem 2(a) is violated. 


In general, the candidate Skolem functions generated from the approxima- 
tions discussed above may not always be correct. Indeed, the conditions discussed 
above are only sufficient, but not necessary, for the approximations to be exact. 
Hence, we need a separate check to see if our candidate Skolem functions are cor- 
rect. To do this, we use an error formula &w(X', X, Y) = F(X', Y) ^ Az (ti ^ 
wi) ^S F(X, Y), as described in [23], and check its satisfiability. The correctness 
of this check depends on the following result from [23]. 
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Theorem 3 ([23]). ew is unsatisfiable iff P is a correct Skolem function vector. 


Algorithm 1. BFSS 


Input: F(X, Y) in NNF (or wDNNF) with inputs |Y| = m, outputs |X| = n, 
Output: Candidate Skolem Functions VW = (v1,..., Yn) 


1 Initialize: Fix sets Up = Ui = (); 

2 repeat 

3 // Repeatedly checks for Unate variables 
4 for each x; € X \ (Uo U U1) do 

5 if F is positive unate in z; // check a; pure or n} (Eq 1) SAT; 
6 then 

7 F: Flai 1], U1 = U1 U (xi) 

8 else if F is negative unate in xi // 7a; pure or y (Eq 2)SAT ; 
9 then 
10 F 1 Flai 0], Uo = Uo U {xi} 
11 until F is unchanged // No Unate variables remaining; 
12 Choose an ordering < of X // Section 6 discusses ordering used; 
13 for each x; € X in < order do 
14 if x; € U; for j € {0,1} // Assume x1 X T2 X ... qn; 
15 then 
16 i —j 
17 else 

18 pi is as defined in (Eq 3) 

19 if error formula ey is UNSAT then 
20 terminate and output V 
21 else 
22 call Phase 2 


We now combine all the above ingredients to come up with algorithm BFss 
(for Blazingly Fast Skolem Synthesis), as shown in Algorithm 1. The algorithm 
can be divided into three parts. In the first part (lines 2-11), unateness is checked. 
This is done in two ways: (i) we identify pure literals in F by simply examining 
the labels of leaves in the DAG representation of F in NNF, and (ii) we check 
the satisfiability of the formulas n? and 7; , as defined in Eqs. 1 and 2. This 
requires invoking a SAT solver in the worst-case, and is repeated at most O(n?) 
times until there are no more unate variables. Hence this requires O(n?) calls to 
a SAT solver. Once we have done this, by Proposition 1, the constants 1 or 0 (for 
positive or negative unate variables respectively) are correct Skolem functions 
for these variables. 

In the second part, we fix an ordering of the remaining output variables 
according to an experimentally sound heuristic, as described in Sect. 6, and com- 
pute candidate Skolem functions for these variables according to Eq. 3. We then 


What’s Hard About Boolean Functional Synthesis? 261 


check the satisfiability of the error formula ew to determine if the candidate 
Skolem functions are indeed correct. If the error formula is found to be unsat- 
isfiable, we know from Theorem 3 that we have the correct Skolem functions, 
which can therefore be output. This concludes phase 1 of algorithm Brss. If 
the error formula is found to be satisfiable, we move to phase 2 of algorithm 
BFSS — an adaptation of the CEGAR-based technique described in [23], and dis- 
cussed briefly in Sect. 5. It is not difficult to see that the running time of phase 
1 is polynomial in the size of the input, relative to an NP-oracle (SAT solver 
in practice). This also implies that the Skolem functions generated can be of at 
most polynomial size. Finally, from Theorem 2 we also obtain that if F satisfies 
Theorem 2(a), Skolem functions generated in phase 1 are correct. From the above 
reasoning, we obtain the following properties of phase 1 of BFSS: 


Theorem 4. 1. For all unate variables, phase 1 of BFSS computes correct 
Skolem functions. 

2: If Ê is in wDNNF, phase 1 of BFSS computes all Skolem functions correctly. 

3. The running time of phase 1 of BFSS is polynomial in input size, relative to 
an NP-oracle. Specifically, the algorithm makes O(n?) calls to an NP-oracle. 

4. The candidate Skolem functions output by phase 1 of BFSS have size at most 
polynomial in the size of the input. 


Discussion: We make two crucial and related observations. First, by our hard- 
ness results in Sect. 3, we know that the above algorithm cannot solve BFnS 
for all inputs, unless some well-regarded complexity-theoretic conjectures fail. 
As a result, we must go to phase 2 on at least some inputs. Surprisingly, our 
experiments show that this is not necessary in the majority of benchmarks. 

The second observation tries to understand why phase 1 works in most cases 
in practice. While a conclusive explanation isn’t easy, we believe Theorem 2 
explains the success of phase 1 in several cases. By [15], we know that all Boolean 
functions have a DNNF (and hence wDNNF) representation, although it may take 
exponential time to compute this representation. This allows us to define two 
preprocessing procedures. In the first, we identify cases where we can directly 
convert to wONNF and use the Phase 1 algorithm above. And in the second, we 
use several optimization scripts available in the ABC [26] library to optimize the 
AIG representation of F. Fora majority of benchmarks, this appears to yield 
a representation of F that allows the proof of Theorem 2(a) to go through. For 
the rest, we apply the Phase 2 algorithm as described below. 


Quantitative guarantees of “goodness”. Given our theoretical and practical 
insights of the applicability of phase 1 of BFSS, it would be interesting to measure 
how much progress we have made in phase 1, even if it does not give the correct 
Skolem functions. One way to measure this “goodness” is to estimate the number 
of counterexamples as a fraction of the size of the input space. Specifically, given 
the error formula, we get an approximate count of the number of models for this 
formula projected on the inputs Y. This can be obtained efficiently in practice 
with high confidence using state-of-the-art approximate model counters, viz. [13], 
with complexity in BPPP”. The approximate count thus obtained, when divided 
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by 2!¥! gives the fraction of input combinations for which the candidate Skolem 
functions output by phase 1 do not work correctly. We call this the goodness 
ratio of our approximation. 


5 Phase 2: Counterexample-Guided Refinement 


For phase 2, we can use any off-the-shelf worst-case exponential-time Skolem 
function generator. However, given that we already have candidate Skolem func- 
tions with guarantees on their “goodness”, it is natural to use them as starting 
points for phase 2. Hence, we start off with candidate Skolem functions for all x; 
as computed in phase 1, and then update (or refine) them in a counterexample- 
driven manner. Intuitively, a counterexample is a value of the inputs Y for which 
there exists a value of X that renders F(X, Y) true, but for which F(W, Y) eval- 
uates to false. As shown in [23], given a candidate Skolem function vector, every 
satisfying assignment of the error formula ey gives a counterexample. The refine- 
ment step uses this satisfying assignment to update an appropriate subset of the 
approximate 6; and +; functions computed in phase 1. The entire process is then 
repeated until no counterexamples can be found. The final updated vector of 
Skolem functions then gives a solution of the BFnS problem. Note that this idea 
is not new [3,23]. The only significant enhancement we do over the algorithm 
in [23] is to use an almost-uniform sampler [12] to efficiently sample the space 
of counterexamples almost uniformly. This allows us to do refinement with a 
diverse set of counterexamples, instead of using counterexamples in a corner of 
the solution space of ew that the SAT solver heuristics zoom down on. 


6 Experimental Results 


Experimental methodology. Our implementation consists of two parallel 
pipelines that accept the same input specification but represent them in two 
different ways. The first pipeline takes the input formula as an AIG and builds 
an NNF (not necessarily wDNNF) DAG, while the second pipeline builds an 
ROBDD from the input AIG using dynamic variable reordering (no restrictions 
on variable order), and then obtains a wDNNF representation from it using the 
linear-time algorithm described in [15]. Once the NNF/wDNNF representation is 
built, we use Algorithm 1 in Phase 1 and CEGAR-based synthesis using UNIGEN 
[12] to sample counterexamples in Phase 2. We call this ensemble of two pipelines 
as BFSS. We compare BFSS with the following algorithms/tools: (i) PARSYN [3], 
(it) CADET [33], (iii) RSYNTH [38], and (iv) ABSSYNTHE-SKOLEM (based on 
the BFnS step of ABsSvNTHE [10]). 

Our implementation of BFSS uses the ABC [26] library to represent and 
manipulate Boolean functions. Two different SAT solvers can be used with BFSS: 
ABC's default SAT solver, or UNIGEN [12] (to give almost-uniformly distributed 
counterexamples). All our experiments use UNIGEN. 

We consider a total of 504 benchmarks, taken from four different domains: 
(a) forty-eight Arithmetic benchmarks from [17], with varying bit-widths (viz. 
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32, 64, 128, 256, 512 and 1024) of arithmetic operators, (b) sixty-eight Disjunc- 
tive Decomposition benchmarks from [3], generated by considering some of the 
larger sequential circuits in the HWMCC10 benchmark suite, (c) five Factoriza- 
tion benchmarks, also from [3], representing factorization of numbers of different 
bit-widths (8, 10, 12, 14, 16), and (d) three hundred and eighty three QBF Eval 
benchmarks, taken from the Prenex 2QBF track of QBFEval 2017 [32]?. Since 
different tools accept benchmarks in different formats, each benchmark was con- 
verted to both qdimacs and verilog/aiger formats. All benchmarks and the 
procedure by which we generated (and converted) them are detailed in [1]. Recall 
that we use two pipelines for BFSS. We use “balance; rewrite -l; refactor -l; bal- 
ance; rewrite -l; rewrite -lz; balance; refactor -lz; rewrite -lz; balance" as the 
ABC script for optimizing the AIG representation of the input specification. We 
observed that while this results in only 4 benchmarks being in wDNNF in the 
first pipeline, 219 benchmarks were solved in Phase 1 using this pipeline. This 
is attributable to specifications being unate in several output variables, and also 
satisfying the condition of Theorem 2(a) (while not being in wDNNF). In the 
second pipeline, however, we could represent 230 benchmarks in wDNNF, and 
all of these were solved in Phase 1. 

For each benchmark, the order < (ref. step 12 of Algorithm 1) in which Skolem 
functions are generated is such that the variable which occurs in the transitive 
fan-in of the least number of nodes in the AIG representation of the specifica- 
tion is ordered before other variables. This order (<) is used for both BFSS and 
PARSYN. Note that the order < is completely independent of the dynamic vari- 
able order used to construct an ROBDD of the input specification in the second 
pipeline, prior to getting the wDNNF representation. 

All experiments were performed on a message-passing cluster, with 20 cores 
and 64 GB memory per node, each core being a 2.2 GHz Intel Xeon processor. 
The operating system was Cent OS 6.5. Twenty cores were assigned to each 
run of PARSYN. For RSYNTH and CADET a single core on the cluster was used, 
since these tools don't exploit parallel processing. Each pipeline of BFSS was 
executed on a single node; the computation of candidate functions, building of 
error formula and refinement of the counterexamples was performed sequentially 
on 1 thread, and UNIGEN had 19 threads at its disposal (idle during Phase 1). 

The maximum time given for execution of any run was 3600 s. The total 
amount of main memory for any run was restricted to 16GB. The metric used 
to compare the algorithms was time taken to synthesize Boolean functions. The 
time reported for BFSS is the better of the two times obtained from the alterna- 
tive pipelines described above. Detailed results from the individual pipelines are 
available in [2]. 

Results. Of the 504 benchmarks, 177 benchmarks were not solved by any tool 
— 6 of these being from arithmetic benchmarks and 171 from QBFE val. 

Table 1 gives a summary of the performance of BFss (considering the com- 

bined pipelines) over different benchmarks suites. Of the 504 benchmarks, BFSS 


? The track contains 384 benchmarks, but we were unsuccessful in converting 1 bench- 
mark to some of the formats required by the various tools. 
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Table 1. Brss: Performance summary of combined pipelines 


Benchmark domain Total # Benchmarks Phase 1 Phase 2 Solved By 
benchmarks |solved solved started phase 2 
QBFEval 383 170 159 73 11 
Arithmetic 48 35 35 8 0 
Disjunctive decomposition| 68 68 66 2 2 
Factorization 5 5 5 0 0 


was successful on 278 benchmarks; of these, 170 are from QBFEval, 68 from 
Disjunctive Decomposition, 35 from Arithmetic and 5 from Factorization. 

Of the 383 benchmarks in the QBFEval suite, we ran BFSS only on 254 since 
we could not build succinct AIGs for the remaining benchmarks. Of these, 159 
benchmarks were solved by Phase 1 (i.e., 62% of built QBFEval benchmarks) 
and 73 proceeded to Phase 2, of which 11 reached completion. On another 11 
QBFEval benchmarks Phase 1 timed out. Of the 48 Arithmetic benchmarks, 
Phase 1 successfully solved 35 (i.e., ~ 72%) and Phase 2 was started for 8 
benchmarks; Phase 1 timed out on 5 benchmarks. Of the 68 Disjunctive Decom- 
position benchmarks, Phase 1 successfully solved 66 benchmarks (i.e., 97%), 
and Phase 2 was started and reached completion for 2 benchmarks. For the 5 
Factorization benchmarks, Phase 1 was successful on all 5 benchmarks. 

Recall that the goodness ratio is the ratio of the number of counterexamples 
remaining to the total size of the input space after Phase 1. For all benchmarks 
solved by Phase 1, the goodness ratio is 0. We analyzed the goodness ratio at 
the beginning of Phase 2 for 83 benchmarks for which Phase 2 started. For 13 
benchmarks this ratio was small (« 0.002), and Phase 2 reached completion for 
these. Of the remaining benchmarks, 34 also had a small goodness ratio (« 0.1), 
indicating that we were close to the solution at the time of timeout. However, 
27 benchmarks in QBFEval had goodness ratio close to > 0.9, indicating that 
most of the counter-examples were not eliminated by timeout. 

We next compare the performance of BFSS with other state-of-art tools. For 
clarity, since the number of benchmarks in the QBFEval suite is considerably 
greater, we plot the QBFEval benchmarks separately. 


BFSS vs CADET: Of the 504 benchmarks, CADET was successful on 231 bench- 
marks, of which 24 belonged to Disjunctive Decomposition, 22 to Arithmetic, 1 
to Factorization and 184 to QBFEval. Figurel(a) gives the performance of the 
two algorithms with respect to time on the QBFEval suite. Here, CADET solved 
35 benchmarks that BFSS could not solve, whereas BFSS solved 21 benchmarks 
that could not be solved by CADET. Figurel(b) gives the performance of the 
two algorithms with respect to time on the Arithmetic, Factorization and Dis- 
junctive Decomposition benchmarks. In these categories, there were a total of 
62 benchmarks that BFSS solved that CADET could not solve, and there was 1 
benchmark that CADET solved but BFSS did not solve. While CADET takes less 
time on Arithmetic benchmarks and many QBFEval benchmarks, on Disjunctive 
Decomposition and Factorization, BFSS takes less time. 
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Fig. 1. BFSS vs CADET: Legend: Q: QBFEval, A: Arithmetic, F: Factorization, D: Dis- 
junctive Decomposition. TO: benchmarks for which the corresponding algorithm was 
unsuccessful. 
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Fig. 2. BFSS vs PARSYN (for legend see Fig. 1) 


BFSS vs PARSYN: Fig. 2 shows the comparison of time taken by BFSS and PARSYN. 
PARSYN was successful on a total of 185 benchmarks, and could solve 1 bench- 
mark which BFSS could not solve. On the other hand, BFSS solved 94 benchmarks 
that PARSYN could not solve. From Fig. 2, we can see that on most of the Arith- 
metic, Disjunctive Decomposition and Factorization benchmarks, BFSS takes less 
time than PARSYN. 

BFSS vs RSYNTH: We next compare the performance of BFSS with RSvNTH. As 
shown in Fig.3, RSYNTH was successful on 51 benchmarks, with 4 benchmarks 
that could be solved by RSYNTH but not by BFss. In contrast, BFSS could solve 
231 benchmarks that RSYNTH could not solve! Of the benchmarks that were 
solved by both solvers, we can see that BFSS took less time on most of them. 
BFSS vs ABSSYNTHE-SKOLEM: ABSSYNTHE-SKOLEM was successful on 217 
benchmarks, and could solve 31 benchmarks that BFSS could not solve. In con- 
trast, BFSS solved a total of 92 benchmarks that ABSSYNTHE-SKOLEM could not. 
Figure 4 shows a comparison of running times of BFSS and ABSSYNTHE-SKOLEM. 
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Fig. 4. BFSS vs ABSSYNTHE-SKOLEM (for legend see Fig. 1) 


7 Conclusion 


In this paper, we showed some complexity-theoretic hardness results for the 
Boolean functional synthesis problem. We then developed a two-phase approach 
to solve this problem, where the first phase, which is an efficient algorithm gen- 
erating poly-sized functions surprisingly succeeds in solving a large number of 
benchmarks. To explain this, we identified sufficient conditions when phase 1 
gives the correct answer. For the remaining benchmarks, we employed the second 
phase of the algorithm that uses a CEGAR-based approach and builds Skolem 
functions by exploiting recent advances in SAT solvers/approximate counters. 
As future work, we wish to explore further improvements in Phase 2, and other 
structural restrictions on the input that ensure completeness of Phase 1. 
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Abstract. Program synthesis is the mechanised construction of soft- 
ware. One of the main difficulties is the efficient exploration of the very 
large solution space, and tools often require a user-provided syntactic 
restriction of the search space. We propose a new approach to program 
synthesis that combines the strengths of a counterexample-guided induc- 
tive synthesizer with those of a theory solver, exploring the solution space 
more efficiently without relying on user guidance. We call this approach 
CEGIS(T), where 7 is a first-order theory. In this paper, we focus on one 
particular challenge for program synthesizers, namely the generation of 
programs that require non-trivial constants. This is a fundamentally diffi- 
cult task for state-of-the-art synthesizers. We present two exemplars, one 
based on Fourier-Motzkin (FM) variable elimination and one based on 
first-order satisfiability. We demonstrate the practical value of CEGIS(7) 
by automatically synthesizing programs for a set of intricate benchmarks. 


1 Introduction 


Program synthesis is the problem of finding a program that meets a correctness 
specification given as a logical formula. This is an active area of research in which 
substantial progress has been made in recent years. 

In full generality, program synthesis is an exceptionally difficult problem, and 
thus, the research community has explored pragmatic restrictions. One particu- 
larly successful direction is Syntaz- Guided Program Synthesis (SyGuS) [2]. The 
key idea of SyGuS is that the user supplements the logical specification with 
a syntactic template for the solution. Leveraging the user's intuition, SyGuS 
reduces the solution space size substantially, resulting in significant speed-ups. 

Unfortunately, it is difficult to provide the syntactic template in many prac- 
tical applications. A very obvious exemplar of the limits of the syntax-guided 
approach are programs that require non-trivial constants. In such a scenario, the 
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syntax-guided approach requires that the user provides the exact value of the 
constants in the solution. 

For illustration, let’s consider a user who wants to synthesize a program that 
rounds up a given 32-bit unsigned number z to the next highest power of two. If 
we denote the function computed by the program by f(x), then the specification 
can be written as x < 221 ^ f(z)&(—f(x)) = f(x) ^ f(x) > x A 2a > f(x). The 
first conjunct forces f(x) to be a power of two, the other requires it to be the 
next highest. A possible solution for this is given by the following C program: 


1 x=x-l; 

2 x |= x >> 1; 
3 x |= x >> 2; 
a x |= x >> 4; 
5 x |= x >> 8; 
e x |= x >> 16; 
7 x=x+l1; 


It is improbable that the user knows that the constants in the solution are 
exactly 1, 2, 4, 8, 16, and thus, she will be unable to explicitly restrict the 
solution space. As a result, synthesizers are very likely to enumerate possible 
combinations of constants, which is highly inefficient. 

In this paper we propose a new approach to program synthesis that combines 
the strengths of a counterexample-guided inductive synthesizer with those of a 
solver for a first-order theory in order to perform a more efficient exploration 
of the solution space, without relying on user guidance. Our inspiration for this 
proposal is DPLL(7), which has boosted the performance of solvers for many 
fragments of quantifier-free first-order logic [16,23]. DPLL(7) combines reason- 
ing about the Boolean structure of a formula with reasoning about theory facts 
to decide satisfiability of a given formula. 

In an attempt to generate similar technological advancements in program syn- 
thesis, we propose a new algorithm for program synthesis called CounterExample- 
Guided Inductive Synthesis(7), where 7 is a given first-order theory for which 
we have a specialised solver. Similar to its counterpart DPLL(7), the CEGIS(T) 
architecture features communication between a synthesizer and a theory solver, 
which results in a much more efficient exploration of the search space. 

While standard CEGIS architectures [19,30] already make use of SMT solvers, 
the typical role of such a solver is restricted to validating candidate solutions and 
providing concrete counterexamples that direct subsequent search. By contrast, 
CEGIS(T) allows the theory solver to communicate generalised constraints back 
to the synthesizer, thus enabling more significant pruning of the search space. 

There are instances of more sophisticated collaboration between a program 
synthesizer and theory solvers. The most obvious such instance is the program 
synthesizer inside the CVC4 SMT solver [27]. This approach features a very 
tight coupling between the two components (i.e., the synthesizer and the theory 
solvers) that takes advantage of the particular strengths of the SMT solver by 
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reformulating the synthesis problem as the problem of refuting a universally 
quantified formula (SMT solvers are better at refuting universally quantified 
formulae than at proving them). Conversely, in our approach we maintain a 
clear separation between the synthesizer and the theory solver while performing 
comprehensive and well-defined communication between the two components. 
This enables the flexible combination of CEGIS with a variety of theory solvers, 
which excel at exploring different solution spaces. 


Contributions 


— We propose CEGIS(7), a program synthesis architecture that facilitates the 
communication between an inductive synthesizer and a solver for a first-order 
theory, resulting in an efficient exploration of the search space. 

— We present two exemplars of this architecture, one based on Fourier-Motzkin 
(FM) variable elimination [7] and one using an off-the-shelf SMT solver. 

- We have implemented CEGIS(7) and compared it against state-of-the-art 
program synthesizers on benchmarks that require intricate constants in the 
solution. 


2 Preliminaries 


2.1 The Program Synthesis Problem 


Program synthesis is the task of automatically generating programs that satisfy 
a given logical specification. A program synthesizer can be viewed as a solver for 
existential second-order logic. An existential second-order logic formula allows 
quantification over functions as well as ground terms [28]. 

The input specification provided to a program synthesizer is of the form 
3P. Vz.c(P, x), where P ranges over functions (where a function is represented 
by the program computing it), a ranges over ground terms, and c is a quantifier- 
free formula. 


2.2  CounterExample Guided Inductive Synthesis 


CounterExample-Guided Inductive Synthesis (CEGIS) is a popular approach to 
program synthesis, and is an iterative process. Each iteration performs inductive 
generalisation based on counterexamples provided by a verification oracle. Essen- 
tially, the inductive generalisation uses information about a limited number of 
inputs to make claims about all the possible inputs in the form of candidate 
solutions. 

The CEGIS framework is illustrated in Fig.1 and consists of two phases: 
the synthesis phase and the verification phase. Given the specification of the 
desired program, c, the inductive synthesis procedure generates a candidate 
program P* that satisfies c(P*, æ) for a subset @ inputs of all possible inputs. The 
candidate program P" is passed to the verification phase, which checks whether 
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Fig. 1. CEGIS block diagram 


it satisfies the specification o(P*,x) for all possible inputs. This is done by 
checking whether ^c(P*, x) is unsatisfiable. If so, Va.0(P*, x) is valid, and we 
have successfully synthesized a solution and the algorithm terminates. Otherwise, 
the verifier produces a counterexample c from the satisfying assignment, which 
is then added to the set of inputs passed to the synthesizer, and the loop repeats. 

The method used in the synthesis and verification blocks varies in differ- 
ent CEGIS implementations; our CEGIS implementation uses Bounded Model 
Checking [8]. 


2.3 DPLL(T) 


DPLL(7) is an extension of the DPLL algorithm, used by most propositional 
SAT solvers, by a theory 7. We give a brief overview of DPLL(7) and compare 
DPLL(7) with CEGIS(T). 

Given a formula F from a theory 7, a propositional formula F, is created 
from F in which the theory atoms are replaced by Boolean variables (the “propo- 
sitional skeleton"). The standard DPLL algorithm, comprising DECIDE, Boolean 
Constraint Propagation (BCP), ANALYZE-CONFLICT and BACKTRACK, gener- 
ates an assignment to the Boolean variables in Fp, as illustrated in Fig. 2. The 
theory solver then checks whether this assignment is still consistent when the 
Boolean variables are replaced by their original atoms. If so, a satisfying assign- 
ment for F has been found. Otherwise, a constraint over the Boolean variables 
in Fp is passed back to DECIDE, and the process repeats. 

In the very first SMT solvers, a full assignment to the Boolean variables 
was obtained, and then the theory solver returned only a single counterexample, 
similar to the implementations of CEGIS that are standard now. Such SMT 
solvers are prone to enumerating all possible counterexamples, and so the key 
improvement in DPLL(7) was the ability to pass back a more general constraint 
over the variables in the formula as a counterexample [16]. Furthermore, modern 
variants of DPLL(T) call the theory solver on partial assignments to the variables 
in Fp. Our proposed, new synthesis algorithm offers equivalents of both of these 
ideas that have improved DPLL(T). 
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Fig. 2. DPLL(7) with theory propagation 


3 Motivating Example 


In each iteration of a standard CEGIS loop, the communication from the verifica- 
tion phase back to the synthesis phase is restricted to concrete counterexamples. 
This is particularly detrimental when synthesizing programs that require non- 
trivial constants. In such a setting, it is typical that a counterexample provided 
by the verification phase only eliminates a single candidate solution and, conse- 
quently, the synthesizer ends up enumerating possible constants. 

For illustration, let’s consider the trivial problem of synthesizing a function 
f(x) where f(a) < 0 if x < 334455 and f(x) = 0, otherwise. One possible 
solution is f(x) = ite (a < 334455)—10, where ite stands for if then else. 

In order to make the synthesis task even simpler, we are going to assume that 
we know a part of this solution, namely we know that it must be of the form 
f(x) = ite (x < ?) —10, where “?” is a placeholder for the missing constant that 
we must synthesize. A plausible scenario for a run of CEGIS is presented next: 
the synthesis phase guesses f(a) = ite (x < 0) —1 0, for which the verification 
phase returns x = 0 as a counterexample. In the next iteration of the CEGIS 
loop, the synthesis phase guesses f(x) = ite(x < 1)—1 0 (which works for x = 0) 
and the verifier produces x = 1 as a counterexample. Following the same pattern, 
the synthesis phase will enumerate all the candidates 


f(x) = ite (x < 2) -10 


f(x) = ite (x < 334454) —10 


before finding the solution. This is caused by the fact that each of the concrete 
counterexamples 0,...,334454 eliminate one candidate only from the solution 
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space. Consequently, we need to propagate more information from the verifier 
to the synthesis phase in each iteration of the CEGIS loop. 


Proving Properties of Programs. Synthesis engines can be used as reasoning 
engines in program analysers, and constants are important for this application. 
For illustration, let’s consider the very simple program below, which increments 
a variable x from 0 to 100000 and asserts that its value is less than 100005 on 
exit from the loop. 


1 int x=0; 
2 while (x<=100000) x++; 
3 assert(x<100005); 


Proving the safety of such a program, i.e., that the assertion at line 3 is not 
violated in any execution of the program, is a task well-suited for synthesis (the 
Syntax Guided Synthesis Competition [5] has a track dedicated to synthesizing 
safety invariants). For this example, a safety invariant is x < 100002, which holds 
on entrance to the loop, is inductive with respect to the loop’s body, and implies 
the assertion on exit from the loop. 

While it is very easy for a human to deduce this invariant, the need for a non- 
trivial constant makes it surprisingly difficult for state-of-the-art synthesizers: 
both CVC4 (version 1.5) [27] and EUSolver (version 2017-06-15) [3] fail to find 
a solution in an hour. 


4 CEGIS(T) 


4.1 Overview 


In this section, we describe the architecture of CEGIS(7 ), which is obtained by 
augmenting the standard CEGIS loop with a theory solver. As we are particularly 
interested in the synthesis of programs with constants, we present CEGIS(T) 
from this particular perspective. In such a setting, CEGIS is responsible for 
synthesizing program skeletons, whereas the theory solver generates constraints 
over the literals that denote constants. These constraints are then propagated 
back to the synthesizer. 

In order to explain the main ideas behind CEGIS(7) in more detail, we 
first differentiate between a candidate solution, a candidate solution skeleton, 
a generalised candidate solution and a final solution. 


Definition 1 (Candidate solution). Using the notation in Sect. 2.2, a pro- 
gram P is a candidate solution if V@inputs-o(P, Linputs) is true for some subset 
Linputs Of all possible z. 


Definition 2 (Candidate solution skeleton). Given a candidate solution 
P, the skeleton of P, denoted by P[?], is obtained by replacing each constant in 
P with a hole. 
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Fig. 3. CEGIS(T) 


Definition 3 (Generalised candidate solution). Given a candidate solu- 
tion skeleton P|?|, we obtain a generalised candidate P[v] by filling each hole in 
P[?] with a distinct symbolic variable, i.e., variable v; will correspond to the i-th 
hole. Then v= [v1,...,Un], where n denotes the number of holes in P[?]. 


Definition 4 (Final solution). A candidate solution P is a final solution if 
the formula Va.o(P, a) is valid. 


Example 1 (Candidate solution, candidate solution skeleton, generalised candi- 
date solution, final solution). Given the example in Sect.3, if tinputs = {0}, 
then f(x) = —2 is a candidate solution. The corresponding candidate skeleton 
is f[?](z) = ? and the generalised candidate is f|vi](x) = vı. A final solution for 
this example is f(x) = ite (x < 334455) —1 0. 


The communication between the synthesizer and the theory solver in 
CEGIS(T) is illustrated in Fig. 3 and can be described as follows: 


— The CEGIS architecture (enclosed in a red rectangle) deduces the candidate 
solution P*, which is provided to the theory solver. 

— The theory solver (enclosed in a blue rectangle) obtains the skeleton P*[?] 
of P* and generalises it to P*[v] in the box marked CONSTANT REMOVAL. 
Subsequently, DEDUCTION attempts to find a constraint over v describing 
those values for which P*|v] is a final solution. This constraint is propagated 
back to CEGIS. Whenever there is no valuation of v for which P*[v] becomes 
a final solution, the constraint needs to block the current skeleton P*[?]. 
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The CEGIS(7) algorithm is given as Algorithm 1 and proceeds as follows: 


- CEGIS synthesis phase: checks the satisfiability of V2 inputs. 0 (P, t inputs) 
where @ inputs is a subset of all possible x and obtains a candidate solution P*. 
If this formula is unsatisfiable, then the synthesis problem has no solution. 

— CEGIS verification phase: checks whether there exists a concrete coun- 
terexample for the current candidate solution by checking the satisfiability of 
the formula ^c(P*, x). If the result is UNSAT, then P* is a final solution to 
the synthesis problem. If the result is SAT, a concrete counterexample cex 
can be extracted from the satisfying assignment. 

— Theory solver: if P* contains constants, then they are eliminated, resulting 
in the P*[?] skeleton, which is afterwards generalised to P*[v]. The goal of 
the theory solver is to find 7-implied literals and communicate them back to 
the CEGIS part in the form of a constraint, C(P, P*, v). In Algorithm 1, this 
is done by Deduction(o, P*|v]). The result of Deduction(c, P*[v]) is of the 
following form: whenever there exists a valuation of v for which the current 
skeleton P*[?] is a final solution, res = true and C(P, P*,v) = A; 4, vi = ci, 
where c; are constants; otherwise, res — false and C(P, P*, v) needs to block 
the current skeleton P"[?], i.e., C(P, P*,v) = P[?] Z P*[?]. 

— CEGIS learning phase: adds new information to the problem specification. 
If we did not use the theory solver (ie., the candidate P* found by the 
synthesizer did not contain constants or the problem specification was out of 
the theory solver's scope), then the learning would be limited to adding the 
concrete counterexample cex obtained from the verification phase to the set 
Linputs- However, if the theory solver is used and returns res = true, then 
the second element in the tuple contains valuations for v such that P*[v] is 
a final solution. If res = false, then the second element blocks the current 
skeleton and needs to be added to ø. 


4.2 CEGIS(7) with a Theory Solver Based on FM Elimination 


In this section we describe a theory solver based on FM variable elimination. 
Other techniques for eliminating existentially quantified variables can be used. 
For instance, one might use cylindrical algebraic decomposition [9] for specifica- 
tions with non-linear arithmetic. In our case, whenever the specification ø does 
not belong to linear arithmetic, the FM theory solver is not called. 

As mentioned above, we need to produce a constraint over variables v describ- 
ing the situation when P"[v] is a final solution. For this purpose, we consider 
the formula Jg. ^c(P"|v], x), where v is a satisfiability witness if the specifica- 
tion c admits a counterexample æ for P*. Let E(v) be the formula obtained 
by eliminating z from Jg. -c(P"|v], x). If 2 E(v) is satisfiable, any satisfiability 
witness gives us the necessary valuation for v: 


C(P, P*,v) = A U; = Cj. 


i—l:n 
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Algorithm 1. CEGIS(T) 


1: function CEGIS(T) (specification c) 

2 while true do 

3 /* CEGIS synthesis phase */ 

4 if V2 inputs-o(P, Linputs) is UNSAT then return Failure; 
5: else 
6: 
T 
8 


P* = satisfiability witness for Y£inputs-0 (P, Linputs ); 

/* CEGIS verification phase */ 

if ^(c(P*, x)) is UNSAT then return Final solution P*; 
9: else 


10: cez = satisfiability witness for —=(0(P*,x)); 
11: /* Theory solver */ 

12: if P* contains constants then 

13: Obtain P*[?] from P*; 

14: Generalise P*[?] to P*[v]; 

15: (res, C(P, P*, v)) = Deduction(o, P* [v]); 
16: end if 

17: end if 

18: end if 

19: /* CEGIS learning phase */ 

20: if res then 

21: C (P, P*, v) is of the form A, ,., vi = ci. 

22: return Final solution P"|c]; 

23: else 

24: o(P,a2) =o0(P,2) ^ C(P,P*,v); 

25: Linputs = inputs U {ces}; 

26: end if 


27: end while 
28: end function 


If 3E(v) is UNSAT, then the current skeleton P*[?] needs to be blocked. This 
reasoning is supported by Lemma 1 and Corollary 1. 


Lemma 1. Let E(v) be the formula that is obtained by eliminating x from 
Ja. ^c(P*|v], £). Then, any witness v* to the satisfiability of ~E(v) gives us 
a final solution P*[v*] to the synthesis problem. 


Proof. From the fact that E(v) is obtained by eliminating x from 
x. —-c(P*|v], x), we get that E(v) is equivalent with Jaz.—o(P*[v], x) (we use 
to denote equivalence): 


E(v) = 3a.-o(P"*[v], x). 


Then: 
aE(v) = Vz.c(P*|v], x). 


Consequently, any v^ satisfying —E(v) also satisfies Vz.c(P*[v], x). From 
Va. c(P*[v*], a) and Definition 4 we get that P*[v*] is a final solution. 
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Corollary 1. Let E(v) be the formula that is obtained by eliminating x from 
Jæ. 2c(P"* [v], x). IfAE(v) is unsatisfiable, then the corresponding synthesis prob- 
lem does not admit a solution for the skeleton P" [?]. 


Proof. Given that 5E(v) = Vz.c(P*|v], x), if 4E(v) is unsatisfiable, so is 
Vz.c(P*[v], x), meaning that there is no valuation for v such that the speci- 
fication ø is obeyed for all inputs a. 


For the current skeleton P*[?], the constraint E(v) generalises the con- 
crete counterexample cez (found during the CEGIS verification phase) in 
the sense that the instantiation v of v for which cex failed the specifica- 
tion, i.e., 2c(P"[v*], cez), is a satisfiability witness for E(v). This is true as 
E(v) = da. -c(P*[v], x), which means that the satisfiability witness (v*, cez) 
for 2c(P*[v], x) projected on v is a satisfiability witness for E(v). 


Disjunction. The specification o and the candidate solution may contain dis- 
junctions. However, most theory solvers (and in particular the FM variable 
elimination [7]) work on conjunctive fragments only. A naive approach could 
use case-splitting, i.e., transforming the formula into Disjunctive Normal Form 
(DNF) and then solving each clause separately. This can result in a number of 
clauses exponential in the size of the original formula. Instead, we handle dis- 
junction using the Boolean Fourier Motzkin procedure [20,32]. As a result, the 
constraints we generate may be non-clausal. 


Applying CEGIS(7) with FM to the Motivational Example. We recall 
the example in Sect. 3 and apply CEGIS(7). The problem is 


J3f.Vx.x < 334455 > f(x) «0^ x > 334455 f(x) =0 
which gives us the following specification: 
o(f,x) = (x > 334455 V f(x) < 0) ^ (x < 334455 V f(x) = 0). 


The first synthesis phase generates the candidate f*(a) = 0 for which the ver- 
ification phase returns the concrete counterexample x — 0. As this candidate 
contains the constant 0, we generalise it to f*[vi](z) = v1, for which we get 


c(f"[vi], £) = (x > 334455 V vı < 0) ^ (x < 334455 V vı = 0). 


Next, we use FM to eliminate x from 


3r.—(o(f* [vı], z)) = da.(x < 334455 ^ vı > 0) V (x > 334455 ^ vı # 0). 


Note that, given that formula ^c(f* vi], x) is in DNF, for convenience we directly 
apply FM to each disjunct and obtain E(v1) = vı > 0 V vı 4 0, which charac- 
terises all the values of v; for which there exists a counterexample. When negat- 
ing E(vi) we get vı <0 ^ v1 = 0, which is UNSAT. As there is no valuation of 
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vı for which the current f* is a final solution, the result returned by the theory 
solver is (false, f[?] 4 f*[?]), which is used to augment the specification. Subse- 
quently, a new CEGIS(T) iteration starts. The learning phase has changed the 
specification o to 


o(f, x) = (x > 334455 V f(x) < 0) ^ (x < 334455 V f(x) = 0) ^ f[?] £ ?. 


'This forces the synthesis phase to pick a new candidate solution with a different 
skeleton. The new candidate solution we get is f*(x) = ite (x < 100) —3 1, which 
works for the previous counterexample x = 0. However, the verification phase 
returns the counterexample x — 100. Again, this candidate contains constants 
which we replace by symbolic variables, obtaining 


f* vi, v2, va] (x£) = ite (x < v1) vo vs. 
Next, we use FM to eliminate x from 


z.—(o(f* [v1 v2, vs], £)) = 

v.n(a > 334455 V (x < vı —^ v < 0 A > v — vs < 0)^ 

x «334455 V (x < vı > v2 = 0 A z È vı > v4 = 0)) = 
x > 334455 V x > vi V ve < 0) ^ (x > 334455 V x < vı V v3 < 0)A 
x < 334455 V x > vı V ve = 0) A (x < 334455 V x < vı V v3 = 0)) = 
x < 334455 A x < v1 A v2 È 0) V (x < 334455 A^ x > vı ^us > 0)V 
x > 334455 A x < v1 ^ v9 Æ 0) V (z > 334455 A^ x > vi ^ va 4 0). 


3z.A( 


dz. 


<<>> 


ES, CURT 


As we work with integers, we can rewrite x < 334455 to x < 334454 and x < 
vı to x < vi — 1. Then, we obtain the following constraint E(v1,v2,v3) (as 
aforementioned, we applied FM to each disjunct in -o(f*[v1, v2, va], £)) 


E(v1, v2, v3) = ve>0 V (vı € 334454 ^ v3 > 0) V (vı > 334456 ^ v2 Æ 0) V v3 #0 
whose negation is 
-E(vi, v2, v3) =v « 0^ (v1 > 334454 V v3 < 0) ^ (v1 < 334456 V vo = 0) A v3 =0 


A satisfiability witness is v; = 334455, v2 = —1 and v3 = 0. Thus, the result 
returned by the theory solver is (true, vy = 334455 ^ v3 = —1A vs = 0), which is 
used by CEGIS to obtain the final solution 


f* (x) = ite (x < 334455) —10. 


4.3 CEGIS(7) with an SMT-based Theory Solver 


For our second variant of a theory solver, we make use of an off-the-shelf 
SMT solver that supports quantified first-order formulae. This approach is more 
generic than the one described in Sect. 4.2, as there are solvers for a broad range 
of theories. 
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Recall that our goal is to obtain a constraint C(P, P*,v) that either char- 
acterises the valuations of v for which P"[v] is a final solution or blocks P"[?] 
whenever no such valuation exists. Consequently, we use the SMT solver to check 
the satisfiability of the formula 


$ = Vz.c(P*[v], x). 


If 9 is satisfiable, then any satisfiability witness c gives us a valuation for v 
such that P* is a final solution: C(P,P*,v) = Aj<1.,vi = ci. Conversely, 
if is unsatisfiable then C(P, P*, v) must block the current skeleton P*[?]: 
C(P, P*,v) = P[?] z P*[?]. 


Applying SMT-based CEGIS(7) to the Motivational Example. Again, 
we recall the example in Sect. 3. We will solve it by using SMT-based CEGIS(7) 
for the theory of linear arithmetic. For this purpose, we assume that the synthesis 
phase finds the same sequence of candidate solutions as in Sect. 3. Namely, the 
first candidate is f*(x) = 0, which gets generalised to f*[vi](x) = vı. Then, the 
first SMT call is for Vx.o(v1,2), where 


o(v1, £) = (22334455 V v < 0) A (x < 334455 V vı = 0). 


The SMT solver returns UNSAT, which means that C(f, f*,v1) = f[?] 4 ?. 
The second candidate is f*(a) = ite (x < 100) — 3 1, which generalises to 
f* (v1, v2, vs] (x) = ite (a < v1) ve v3. The corresponding call to the SMT solver 
is for Vx. c((ite (a < v1) vo va), x), for which we obtain the satisfiability witness 
v, = 334455, ve = —1 and v3 = 0. Then C(f, f*, v1, v2, v3) = v4 = 334455 A v2 = 
—1A v3 = 0, which gives us the same final solution we obtained when using FM 
in Sect. 3. 


5 Experimental Evaluation 


5.1 Implementation 


Incremental Satisfiability Solving. Our implementation of CEGIS may some- 
times perform hundreds of loop iterations before finding the correct solution. 
Recall that the synthesis block of CEGIS is based on Bounded Model Checking 
(BMC). Ultimately, this BMC module performs calls to a SAT solver. Conse- 
quently, we may have hundreds of calls to this SAT solver, which are all very 
similar (the same base specification with some extra constraints added in each 
iteration). This makes CEGIS a prime candidate for incremental SAT solving. 
We implemented incremental solving in the synthesis block of CEGIS. 


5.2 Benchmarks 


We have selected a set of bitvector benchmarks from the Syntax-Guided Synthe- 
sis (SyGuS) competition [4] and a set of benchmarks synthesizing safety invari- 
ants and danger invariants for C programs [10]. All benchmarks are written in 
SyGuS-IF [26], a variant of SMT-LIB2. 
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Given that the syntactic restrictions (called the grammar or the template) 
provided in the SyGuS benchmarks contain all the necessary non-trivial con- 
stants, we removed them completely from these benchmarks. Removing just the 
non-trivial constants and keeping the rest of the grammar (with the only con- 
stants being 0 and 1) would have made the problem much more difficult, as 
the constants would have had to be incrementally constructed by applying the 
operators available to 0 and 1. 

We group the benchmarks into three categories: invariant generation, which 
covers danger invariants, safety invariants and the class of invariant generation 
benchmarks from the SyGuS competition; hackers/crypto, which includes bench- 
marks from hackers-delight and cryptographic circuits; and comparisons, com- 
posed of benchmarks that require synthesizing longer programs with compar- 
isons, e.g., finding the maximum value of 10 variables. 


5.3 Experimental Setup 


We conduct the experimental evaluation on a 12-core 2.40 GHz Intel Xeon E5- 
2440 with 96 GB of RAM and Linux OS. We use the Linux times command to 
measure CPU time used for each benchmark. The runtime is limited to 600s per 
benchmark. We use MiniSat [12] as the SAT solver, and Z3 v4.5.1 [22] as the 
SMT-solver in CEGIS(7) with SMT-based theory solver. The SAT solver could, 
in principle, be replaced with Z3 to solve benchmarks over a broader range of 
theories. 
We present results for four different configurations of CEGIS: 


CEGIS(7)-FM: CEGIS(7) with Fourier Motzkin as the theory solver; 
CEGIS(7)-SMT: CEGIS(7) with Z3 as the theory solver; 

— CEGIS: basic CEGIS as described in Sect. 2.2; 

— CEGIS-Inc: basic CEGIS with incremental SAT solving. 


We compare our results against the latest release of CVC4, version 1.5. As 
we are interested in running our benchmarks without any syntactic template, 
the first reason for choosing CVCA [6] as our comparison point is the fact that 
it performs well when no such templates are provided. This is illustrated by the 
fact that it won the Conditional Linear Integer Arithmetic track of the SyGuS 
competition 2017 [4], one of two tracks where a syntactic template was not used. 
The other track without syntactic templates is the invariant generation track, in 
which CVC4 was close second to LoopInvGen [24]. A second reason for picking 
CVC4 is its overall good performance on all benchmarks, whereas LoopInvGen 
is a solver specialised to invariant generation. 

We also give a row of results for a hypothetical 4-core implementation, as 
would be allowed in the SyGuS Competition, running 4 configurations in paral- 
lel: CEGIS(7)-FM, CEGIS(7)-SMT, CEGIS, and CEGIS-Inc. A link to the full 
experimental environment, including scripts to reproduce the results, all bench- 
marks and the tool, is provided in the footnote as an Open Virtual Appliance 
(OVA)!. 


1 www.cprover.org/synthesis. 
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Table 1. Experimental results — for every set of benchmarks, we give the number 
of benchmarks solved by each configuration within the timeout and the average time 
taken per solved benchmark 


Configuration inv hackers comparisons | other total 

is is is As # ES 
CEGIS(T)-SMT 33 |33.1)| 4 2.5 |3 195.5 16 14.0 |56)34.1 
CEGIS(7 )-FM 16 |93.1|4 | 52.8 1 0.06 |12| 0.7 [|33|51.8 
CEGIS 16 |31.3/4 | 52.0 1 0.03 |14| 5.3 |35|22.4 
CEGIS-Inc 16 |39.4/5 |167.4 |1 0.08 |14| 4.2 |36/42.4 
Multi-core 33 |32.5|5 | 92.2 |3 194.7 16 | 3.8 57,383 
CVCA 6 | 6.5/6 0.002 | 7 0.006 |11 | 0.003/30, 1.3 
# benchmarks 48 6 7 19 80 
CVC4 with grammar 4 |45.8/0 0 6| 2.4 |10/19.8 
# benchmarks with grammar 3 7 16 34 


5.4 Results 


The results are given in Tablel. In combination, our CEGIS combination (i.e., 
CEGIS multi-core) solves 27 more benchmarks than CVCA, but the average time 
per benchmark is significantly higher. 

As expected, both CEGIS(7)-SMT and CEGIS(7)-FM solve more of the 
invariant generation benchmarks which require synthesizing arbitrary constants 
than CVC4. Conversely, CVC4 performs better on benchmarks that require syn- 
thesizing long programs with many comparison operations, e.g., finding the max- 
imum value in a series of numbers. CVCA solves more of the hackers-delight and 
cryptographic circuit benchmarks, none of which require constants. 

Our implementation of basic CEGIS (and consequently of all configurations 
built on top of this) only increases the length of the synthesized program when 
no program of a shorter length exists. Thus, it is expensive to synthesize longer 
programs. However, a benefit of this architecture is that the programs we syn- 
thesize are the minimum possible length. Many of the expressions synthesized by 
CVCA are very large. This has been noted previously in the Syntax-Guided Syn- 
thesis Competition [5], and synthesizing without the syntactic template causes 
the expressions synthesized to be even longer. 

Although CEGIS-Inc is quicker per iteration of the CEGIS loop than basic 
CEGIS, the average time per benchmark is not significantly better because of the 
variation in times produced by CEGIS. We hypothesise that the use of incremen- 
tal solving makes CEGIS-Inc more prone to getting stuck exploring *bad" areas 
of the solution space than basic CEGIS, and so it requires more iterations than 
basic CEGIS for some benchmarks. The incremental solving preserves clauses 
learnt from any conflicts in previous iterations, which means that each SAT solv- 
ing iteration will begin from exactly the same state as the previous one. The basic 
implementation doesn't preserve these clauses and so is free to start exploring a 
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new part of the search space each iteration. These effects could be mitigated by 
running multiple incremental solving instances in parallel. 

In order to validate the assumption that CVC4 works better without a tem- 
plate than with one where the non-trivial constants were removed (see Sect. 5.2), 
we also ran CVC4 on a subset of the benchmarks with a syntactic template 
comprising the full instruction set we give to CEGIS, plus the constants 0 and 1. 
Note for some benchmarks it is not possible to add a grammar because the 
SYGUS-IF language does not allow syntactic templates for benchmarks that 
use the loop invariant syntax. With a grammar, CVC4 solves fewer of the bench- 
marks, and takes longer per benchmark. The syntactic template is helpful only 
in cases where non-trivial constants are needed and the non-trivial constants are 
contained within the template. 

We ran EUSolver on the benchmarks with the syntactic templates, but the 
bitvector support is incomplete and missing some key operations. As a result 
EUSolver was unable to solve any benchmarks in the set, and so we have not 
included the results in the table. 


Benefit of Literal Constants. We have investigated how useful the constants 
in the problem specification are, and have tried a configuration that seeds all 
constants in the problem specification as hints into the synthesis engine. This 
proved helpful for basic CEGIS only but not for the CEGIS(7 ) configurations. 
Our hypothesis is that the latter do not benefit from this because they already 
have good support for computing constants. We dropped this option in the 
results presented in this section. 


5.5 Threats to Validity 


Benchmark Selection: We report an assessment of our approach on a diverse 
selection of benchmarks. Nevertheless, the set of benchmarks is limited within 
the scope of this paper, and the performance may not generalise to other bench- 
marks. 


Comparison with State of the Art: CVC4 has not, as far as we are aware, been 
used for synthesis of bitvector functions without syntactic templates, and so this 
unanticipated use case may not have been fully tested. We are unable to compare 
all results to other solvers from the SyGuS Competition because EUSolver and 
EUPhony do not support synthesizing bitvector programs without a syntactic 
template, EUSolver's support for bitvectors is incomplete even when used with a 
template, LoopInvGen and DryadSynth do not support bitvectors, and E3Solver 
tackles only Programming By Example benchmarks [5]. 


Choice of Theories: We evaluated the benefits of CEGIS(7) in the context of 
two specific theory instances. While the improvements in our experiments are 
significant, it is uncertain whether this will generalise to other theories. 
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6 Related Work 


The traditional view of program synthesis is that of synthesis from complete 
specifications [21]. Such specifications are often unavailable, difficult to write, or 
expensive to check against using automated verification techniques. This has led 
to the proposal of inductive synthesis and, more recently, of oracle-based induc- 
tive synthesis, in which the complete specification is not available and oracles 
are queried to choose programs [19]. 

A well-known application of CEGIS is program sketching [29,31], where the 
programmer uses a partial program, called a sketch, to describe the desired imple- 
mentation strategy, and leaves the low-level details of the implementation to an 
automated synthesis procedure. Inspired by sketching, Syntax-Guided Program 
Synthesis (SyGuS) [2] requires the user to supplement the logical specification 
provided to the program synthesizer with a syntactic template that constrains 
the space of solutions. In contrast to SyGuS, our aim is to improve the efficiency 
of the exploration to the point that user guidance is no longer required. 

Another very active area of program synthesis is denoted by component-based 
approaches [1, 13-15, 17,18,25]. Such approaches are concerned with assembling 
programs from a database of existing components and make use of various tech- 
niques, from counterexample-guided synthesis [17] to type-directed search with 
lightweight SMT-based deduction and partial evaluation [14] and Petri-nets [15]. 
The techniques developed in the current paper are applicable to any component- 
based synthesis approach that relies on counterexample-guided inductive synthe- 
sis. 

Heuristics for constant synthesis are presented in [11], where the solution 
language is parameterised, inducing a lattice of progressively more expressive 
languages. One of the parameters is word width, which allows synthesizing pro- 
grams with constants that satisfy the specification for smaller word widths. 
Subsequently, heuristics extend the program (including the constants) to the 
required word width. As opposed to this work, CEGIS(7) denotes a systematic 
approach that does not rely on ad-hoc heuristics. 

Regarding the use of SMT solvers in program synthesis, they are frequently 
employed as oracles. By contrast, Reynolds et al. [27] present an efficient encod- 
ing able to solve program synthesis constraints directly within an SMT solver. 
Their approach relies on rephrasing the synthesis constraint as the problem of 
refuting a universally quantified formula, which can be solved using first-order 
quantifier instantiation. Conversely, in our approach we maintain a clear sepa- 
ration between the synthesizer and the theory solver, which communicate in a 
well-defined manner. In Sect.5, we provide a comprehensive experimental com- 
parison with the synthesizer described in [27]. 


7 Conclusion 


We proposed CEGIS(7), a new approach to program synthesis that combines 
the strengths of a counterexample-guided inductive synthesizer with those of a 
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theory solver to provide a more efficient exploration of the solution space. We 
discussed two options for the theory solver, one based on FM variable elimination 
and one relying on an off-the-shelf SMT solver. Our experiments results showed 
that, although slower than CVC4, CEGIS(T) can solve more benchmarks within 
a reasonable time that require synthesizing arbitrary constants, where CVC4 
fails. 
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Abstract. We study the reactive synthesis problem for hyperproperties 
given as formulas of the temporal logic HyperLTL. Hyperproperties gen- 
eralize trace properties, i.e., sets of traces, to sets of sets of traces. Typical 
examples are information-flow policies like noninterference, which stipu- 
late that no sensitive data must leak into the public domain. Such prop- 
erties cannot be expressed in standard linear or branching-time temporal 
logics like LTL, CTL, or CTL*. We show that, while the synthesis prob- 
lem is undecidable for full HyperLTL, it remains decidable for the 3*, 
3*V!, and the linear V* fragments. Beyond these fragments, the synthesis 
problem immediately becomes undecidable. For universal HyperLTL, we 
present a semi-decision procedure that constructs implementations and 
counterexamples up to a given bound. We report encouraging experimen- 
tal results obtained with a prototype implementation on example spec- 
ifications with hyperproperties like symmetric responses, secrecy, and 
information-flow. 


1 Introduction 


Hyperproperties [5] generalize trace properties in that they not only check 
the correctness of individual computation traces in isolation, but relate mul- 
tiple computation traces to each other. HyperLTL [4] is a logic for expressing 
temporal hyperproperties, by extending linear-time temporal logic (LTL) with 
explicit quantification over traces. HyperLTL has been used to specify a variety 
of information-flow and security properties. Examples include classical proper- 
ties like non-interference and observational determinism, as well as quantitative 
information-flow properties, symmetries in hardware designs, and formally veri- 
fied error correcting codes [12]. For example, observational determinism can be 
expressed as the HyperLTL formula VzVz'.D(1; = Iw) > O(O7n = On’), stat- 
ing that, for every pair of traces, if the observable inputs are the same, then 
the observable outputs must be same as well. While the satisfiability [9], model 
checking [4,12], and runtime verification [1,10] problem for HyperLTL has been 
studied, the reactive synthesis problem of HyperLTL is, so far, still open. 
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In reactive synthesis, we automatically construct an implementation that is 
guaranteed to satisfy a given specification. A fundamental difference to verifi- 
cation is that there is no human programmer involved: in verification, the pro- 
grammer would first produce an implementation, which is then verified against 
the specification. In synthesis, the implementation is directly constructed from 
the specification. Because there is no programmer, it is crucial that the speci- 
fication contains all desired properties of the implementation: the synthesized 
implementation is guaranteed to satisfy the given specification, but nothing is 
guaranteed beyond that. The added expressive power of HyperLTL over LTL is 
very attractive for synthesis: with synthesis from hyperproperties, we can guaran- 
tee that the implementation does not only accomplish the desired functionality, 
but is also free of information leaks, is symmetric, is fault-tolerant with respect 
to transmission errors, etc. 

More formally, the reactive synthesis problem asks for a strategy, that is a 
tree branching on environment inputs whose nodes are labeled by the system 
output. Collecting the inputs and outputs along a branch of the tree, we obtain 
a trace. If the set of traces collected from the branches of the strategy tree 
satisfies the specification, we say that the strategy realizes the specification. 
The specification is realizable iff there exists a strategy tree that realizes the 
specification. With LTL specifications, we get trees where the trace on each 
individual branch satisfies the LTL formula. With HyperLTL, we additionally get 
trees where the traces between different branches are in a specified relationship. 
This is dramatically more powerful. 

Consider, for example, the well-studied distributed version of the reactive 
synthesis problem, where the system is split into a set of processes, that each 
only see a subset of the inputs. The distributed synthesis problem for LTL can 
be expressed as the standard (non-distributed) synthesis problem for HyperLTL, 
by adding for each process the requirement that the process output is observa- 
tionally deterministic in the process input. HyperLTL synthesis thus subsumes 
distributed synthesis. The information-flow requirements realized by HyperLTL 
synthesis can, however, be much more sophisticated than the observational deter- 
minism needed for distributed synthesis. Consider, for example, the dining cryp- 
tographers problem [3]: three cryptographers Ca, Cy, and Ce sit at a table in a 
restaurant having dinner and either one of cryptographers or, alternatively, the 
NSA must pay for their meal. Is there a protocol where each cryptographer can 
find out whether it was a cryptographer who paid or the NSA, but cannot find 
out which cryptographer paid the bill? 

Synthesis from LTL formulas is known to be decidable in doubly exponential 
time. The fact that the distributed synthesis problem is undecidable [21] imme- 
diately eliminates the hope for a similar general result for HyperLTL. However, 
since LTL is obviously a fragment of HyperLTL, this immediately leads to the 
question whether the synthesis problem is still decidable for fragments of Hyper- 
LTL that are close to LTL but go beyond LTL: when exactly does the synthesis 
problem become undecidable? From a more practical point of view, the interest- 
ing question is whether semi-algorithms for distributed synthesis [7,14], which 
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have been successful in constructing distributed systems from LTL specifications 
despite the undecidability of the general problem, can be extended to HyperLTL? 

In this paper, we answer the first question by studying the 3*, 3*V!, and the 
linear V* fragment. We show that the synthesis problem for all thee fragmenta 
is decidable, and the problem becomes undecidable as soon as we go beyond 
these fragments. In particular, the synthesis problem for the full V* fragment, 
which includes observational determinism, is undecidable. 

We answer the second question by studying the bounded version of the synthe- 
sis problem for the V* fragment. In order to detect realizability, we ask whether, 
for a universal HyperLTL formula y and a given bound n on the number of 
states, there exists a representation of the strategy tree as a finite-state machine 
with no more than n states that satisfies y. To detect unrealizability, we check 
whether there exists a counterexample to realizability of bounded size. We show 
that both checks can be effectively reduced to SMT solving. 


Related Work. HyperLTL [4] is a successor of the temporal logic SecLTL [6] 
used to characterize temporal information-flow. The model-checking [4,12], sat- 
isfiability [9], monitoring problem [1,10], and the first-order extension [17] of 
HyperLTL has been studied before. To the best of the authors knowledge, this 
is the first work that considers the synthesis problem for temporal hyperproper- 
ties. We base our algorithms on well-known synthesis algorithms such as bounded 
synthesis [14] that itself is an instance of Safraless synthesis [18] for w-regular 
languages. Further techniques that we adapt for hyperproperties are lazy syn- 
thesis [11] and the bounded unrealizability method [15, 16]. 

Hyperproperties [5] can be seen as a unifying framework for many differ- 
ent properties of interest in multiple distinct areas of research. Information-flow 
properties in security and privacy research are hyperproperties [4]. HyperLTL 
subsumes logics that reason over knowledge [4]. Information-flow in distributed 
systems is another example of hyperproperties, and the HyperLTL realizabil- 
ity problem subsumes both the distributed synthesis problem [13,21] as well as 
synthesis of fault-tolerant systems [16]. In circuit verification, the semantic inde- 
pendence of circuit output signals on a certain set of inputs, enabling a range of 
potential optimizations, is a hyperproperty. 


2 Preliminaries 


HyperLTL. HyperLTL [4] is a temporal logic for specifying hyperproperties. 
It extends LTL by quantification over trace variables 7 and a method to link 
atomic propositions to specific traces. The set of trace variables is V. Formulas 
in HyperLTL are given by the grammar 


Q == Vm.o|3-.9| wv , and 
y :-a.|-v|vvv|ov|vuv , 


where a € AP and 7 € V. The alphabet of a HyperLTL formula is 24”. We allow 
the standard boolean connectives ^, —, — as well as the derived LTL operators 
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release y R Y = (gu b), eventually Oy = true U o, globally Oy = ^, 
and weak until o W  Z Oy V (oU v). 

The semantics is given by the satisfaction relation Fy over a set of traces 
T C (2^P)^. We define an assignment II : Y — (2^P)^ that maps trace variables 
to traces. I7[i,oc] is the trace assignment that is equal to I7(«)[i, oo] for all 7 
and denotes the assignment where the first i items are removed from each trace. 
ET ds if a € II (n)|0] 
Ere  ifIlÉpo 
Frov if Fr o or H Ep v 
FrOe if H[l,oo] Fr p 
Fr puw if i> 0. [i,co] Fr WAVO < j <i. Hj, co] Fr y 
Er dr.y if there is some t € T such that H|r > t] Fr v 
Er Vr.y if for all t € T holds that II[y > t| Fr yp 


SR RRR 


We write T E ọ for {} Er p where {} denotes the empty assignment. Two 
HyperLTL formulas y and v are equivalent, written y = w if they have the same 
models. 

(In)dependence is a common hyperproperty for which we define the following 
syntactic sugar. Given two disjoint subsets of atomic propositions C C AP and 
A C AP, we define independence as the following HyperLTL formula 


Dac :— Var. (V (ar 2 R (^ (cr > 2 (1) 


acA cec 


This guarantees that every proposition c € C solely depends on propositions A. 


Strategies. A strategy f: (21)! — 2? maps sequences of input valuations 27 
to an output valuation 2°. The behavior of a strategy f: (2!)* — 2° is char- 
acterized by an infinite tree that branches by the valuations of J and whose 
nodes w € (2/)* are labeled with the strategic choice f(w). For an infinite 
word w = wgowjw»:-- € (2/)”, the corresponding labeled path is defined as 
(f (e) Uwo) f (wo) Vwi) (f (wow ) Uwe) --- € (279)*. We lift the set containment 
operator € to the containment of a labeled path w = wow w+ € (27?9)^ ina 
strategy tree induced by f: (27)* — 29, i.e., w € f if, and only if, f(e) = won O 
and f((worI)---(w;nI)) = wi+1 NO for all i > 0. We define the satisfaction of 
a HyperLTL formula q (over propositions J U O) on strategy f, written f E ọ, 
as {w | w € f} E vy. Thus, a strategy f is a model of y if the set of labeled paths 
of f is a model of y. 


3  HyperLTL Synthesis 


In this section, we identify fragments of HyperLTL for which the realizability 
problem is decidable. Our findings are summarized in Table 1. 


Definition 1 (HyperLTL Realizability). A HyperLTL formula p over 
atomic propositions AP = IUO is realizable if there is a strategy f: (2')* — 2° 
that satisfies vp. 
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Table 1. Summary of decidability results. 


| 
Lu 


“yl J3'v^l! v* wv*3* | linear V* 


PSPACE-complete | 3EXPTIME | undecidable decidable 


We base our investigation on the structure of the quantifier prefix of the Hyper- 
LTL formulas. We call a HyperLTL formula ọ (quantifier) alternation-free if the 
quantifier prefix consists solely of either universal or existential quantifiers. We 
denote the corresponding fragments as the (universal) V* and the (existential) 
Jd* fragment, respectively. A HyperLTL formula is in the 4*V* fragment, if it 
starts with arbitrarily many existential quantifiers, followed by arbitrarily many 
universal quantifiers. Respectively for the V*3* fragment. For a given natural 
number n, we refer to a bounded number of quantifiers with V", respectively 3". 
The V! realizability problem is equivalent to the LTL realizability problem. 


J* Fragment. We show that the realizability problem for existential HyperLTL 
is PSPACE-complete. We reduce the realizability problem to the satisfiability 
problem for bounded one-alternating 3*V?HyperLTL [9], i.e., finding a trace set 
T such that T E q. 


Lemma 1. An existential HyperLTL formula q is realizable if, and only if, wb = 
pA Dro is satisfiable. 


Proof. Assume f: (2/)* — 2° realizes y, that is f E y. Let T = {w | w € f) be 
the set of traces generated by f. It holds that T F y and T F Dr. ,o. Therefore, 
w is satisfiable. Assume 7 is satisfiable. Let S be a set of traces that satisfies v. 
We construct a strategy f: (2/)* — 2° as 


Woj NO ifø is a prefix of some w|; with w € S , and 
fle) = l 
() otherwise . 


where w|; denotes the trace restricted to I, formally w; N I for all i > 0. Note 
that if there are multiple candidates w € S, then wj; N O is the same for all 
of them because of the required non-determinism Dry. ,o. By construction, all 
traces in S are contained in f and with S E q it holds that f F p as is an 
existential formula. 


Theorem 1. Realizability of existential HyperLTL specifications is decidable. 


Proof. The formula ~ from Lemma 1 is in the 3*V? fragment, for which satisfi- 
ability is decidable [9]. 


Corollary 1. Realizability of 3* HyperLTL specifications is PSPACE-complete. 


Proof. Given an existential HyperLTL formula, we gave a linear reduction to 
the satisfiability of the J*V? fragment in Lemma 1. The satisfiability problem for 
a bounded number of universal quantifiers is in PSPACE [9]. Hardness follows 
from LTL satisfiability, which is equivalent to the 3! fragment. 
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gy ge 


(a) An architecture of two processes that (b) The same architecture as on the left, 
specify process pı to produce c from a and where only the inputs of process po are 
p2 to produce d from b. changed to a and 6. 


Fig. 1. Distributed architectures 


V* Fragment. In the following, we will use the distributed synthesis problem, 
i.e., the problem whether there is an implementation of processes in a distributed 
architecture that satisfies an LTL formula. Formally, a distributed architecture 
A is a tuple (P, penv, Z, O) where P is a finite set of processes with distinguished 
environment process Peny € P. The functions Z: P > 2^P and O: P — 2^P 
define the inputs and outputs of processes. While processes may share the same 
inputs (in case of broadcasting), the outputs of processes must be pairwise dis- 
joint, i.e., for all p Z p' € P it holds that O(p) N O(p') = 0. W.Lo.g. we assume 
that Z(Penv) = 0. The distributed synthesis problem for architectures without 
information forks [13] is decidable. Example architectures are depicted in Fig. 1. 
The architecture in Fig. 1a contains an information fork while the architecture 
in Fig. 1b does not. Furthermore, the processes in Fig. 1b can be ordered linearly 
according to the subset relation on the inputs. 


Theorem 2. The synthesis problem for universal HyperLTL is undecidable. 


Proof. In the V* fragment (and thus in the 3*V* fragment), we can encode a 
distributed architecture [13], for which LTL synthesis is undecidable. In particu- 
lar, we can encode the architecture shown in Fig. 1a. This architecture basically 
specifies c to depend only on a and analogously d on b. That can be encoded 
by D(4y,, (54 and Diyy,, (a4). The LTL synthesis problem for this architecture is 
already shown to be undecidable [13], i.e., given an LTL formula over I = {a,b} 
and O = {c,d}, we cannot automatically construct processes pi and pə that 
realize the formula. 


Linear V* Fragment. For characterizing the linear fragment of HyperLTL, we 
will present a transformation from a formula with arbitrarily many universal 
quantifiers to a formula with only one quantifier. This transformation collapses 
the universal quantifier into a single one and renames the path variables accord- 
ingly. For example, Vz1V7».[]az, V Oar, is transformed into an equivalent vi 
formula Vr. Oar V Oar. However, this transformation does not always produce 
equivalent formulas as VziVm3.[Yaz, — Gn.) is not equivalent to its collapsed 
form Yr.O(ar e ar). Let p be Vm -+ Van. Y. We define the collapsed formula 
of y as collapse(y) := Vrz.w[ni  m]|m > r]... [nn = n] where [ri = m] 
replaces all occurrences of m; in w with m. Although the collapsed term is not 
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always equivalent to the original formula, we can use it as an indicator whether 
it is possible at all to express a universal formula with only one quantifier as 
stated in the following lemma. 


Lemma 2. Either y = collapse(y) or y has no equivalent V* formula. 


Proof. Suppose there is some y € V! with w = q. We show that y = collapse(y). 
Let T be an arbitrary set of traces. Let T = {{w} | w € T}. Because Y € Yt, 
T F v is equivalent to VT’ € T.T’ E a, which is by assumption equivalent to 
VT’ € T.T' E o. Now, ¢ operates on singleton trace sets only. This means that 
all quantified paths have to be the same, which yields that we can use the same 
path variable for all of them. So VT" € T.T" F p © T' E collapse(q) that is 
again equivalent to T F collapse(y). Because Y = collapse(y) and Y = ¢ it holds 
that y = collapse(y). 


The LTL realizability problem for distributed architectures without information 
forks [13] are decidable. These architectures are in some way linear, i.e., the 
processes can be ordered such that lower processes always have a subset of 
the information of upper processes. The linear fragment of universal HyperLTL 
addresses exactly these architectures. 

In the following, we sketch the characterization of the linear fragment of 
HyperLTL. Given a formula y, we seek for variable dependencies of the form 
D;.,(9 with J C I and o € O in the formula. If the part of the formula o 
that relates multiple paths consists only of such constraints D;,,(,; with the 
rest being an LTL property, we can interpret y as a description of a distributed 
architecture. If furthermore, the Dj, , (5, constraints can be ordered such that 
J; € Ji41 for all 2, the architecture is linear. There are three steps to check 
whether ¢ is in the linear fragment: 


1. First, we have to add input-determinism to the formula qae :— Y ^ Dro. 
This preserves realizability as strategies are input-deterministic. 

2. Find for each output variable o; € O possible sets of variables J;, o; depends 
on, such that J; C Ji+1. To check whether the choice of J’s is correct, we test 
if collapse(p) ^ A, co DJi={o;} is equivalent to Yaet. This equivalence check 
is decidable as both formulas are in the universal fragment [9]. 

3. Finally, we construct the corresponding distributed realizability problem 
with linear architecture. Formally, we define the distributed architecture 
A= (P, Denvs L, O) with P = {pi | Oi € O}U{penv}, T(p;) = Ji, O(pi) = {oi}, 
and O(penv) = I. The LTL specification for the distributed synthesis problem 
is collapse(q). 


Definition 2 (linear fragment of V*). A formula q is in the linear fragment 
of V* iff for all o; € O there is a J; C I such that p ^ Drs o = collapse(q) ^ 
Noco Dj, (o, and Ji € Jis for all i. 


Note, that each V! formula y (or y is collapsible to a V! formula) is in the linear 
fragment because we can set all J; = I and additionally collapse(y) = «q holds. 
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As an example of a formula in the linear fragment of V*, consider y = 
Vr, T. Dea}istc} AO(Cr S dr) AO (br > Oer) with I = {a,b} and O = {c, d,e}. 
The corresponding formula asserting input-deterministism is Yaet = Y ^ Dro. 
One possible choice of J’s is {a,b} for c, {a} for d and {a,b} for e. Note, that 
one can use either {a,b} or {a} for c as Dray s (ay ^ (Cx > dr) implies Dra} fe}. 
However, the apparent alternative {b} for e would yield an undecidable archi- 
tecture. It holds that Yaet and collapse(q) ^ Days (c) ^ Daj} (a) ^ Da bye (e) 
are equivalent and, thus, that q is in the linear fragment. 


Theorem 3. The linear fragment of universal HyperLTL is decidable. 


Proof. It holds that p = collapse(y) ^ A, co D; (o,; for some J;'s. The LTL 
distributed realizability problem for collapse(q) in the constructed architecture 
A is equivalent to the HyperLTL realizability of y as the architecture A rep- 
resents exactly the input-determinism represented by formula No; co DJio(o)- 
'The architecture is linear and, thus, the realizability problem is decidable. 


Jj*v! Fragment. In this fragment, we consider arbitrary many existential path 
quantifier followed by a single universal path quantifier. This fragment turns 
out to be still decidable. We solve the realizability problem for this fragment by 
reducing it to a decidable fragment of the distributed realizability problem. 


Theorem 4. Healizability of 3* V! HyperLTL specifications is decidable. 


Proof. Let o be dm... 3m, Yr’. Y. We reduce the realizability problem of ọ to the 
distributed realizability problem for LTL. For every existential path quantifier 
Ti, we introduce a copy of the atomic propositions, written ar, for a € AP. 
Intuitively, those select the paths in the strategy tree where the existential path 
quantifiers are evaluated. Thus, those propositions (1) have to encode an actual 
path in the strategy tree and (2) may not depend on the branching of the strategy 
tree. To ensure (1), we add the LTL constraint OU,, = Ir) > DXOs, = Os) 
that asserts that if the inputs correspond to some path in the strategy tree, 
the outputs on those paths have to be the same. Property (2) is guaranteed 
by the distributed architecture, the processes generating the propositions a;, 
do not depend on the environment output. The resulting architecture Ay is 
([Denv P, P Y}, Penv, {P E 0, p E Los). (pene > lw, p > Ur<i<n On; U Lap E 
Ox }). It is easy to verify that A, does not contain an information fork, thus the 
realizability problem is decidable. The LTL specification 0 is Y A^ A, <;<, OU; = 
Ix) — DO, = On’). The implementation of process p’ (if it exists) is a model 
for the HyperLTL formula (process p producing witness for the 4 quantifier). 
Conversely, a model for y can be used as an implementation of p'. Thus, the 
distributed synthesis problem (A,, 0) has a solution if, and only if, y is realizable. 


V*3* Fragment. The last fragment to consider are formulas in the V*3* frag- 
ment. Whereas the 3*V! fragment remains decidable, the realizability problem 
of V*J3* turns out to be undecidable even when restricted to only one quantifier 
of both sorts (v!3!). 
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Theorem 5. Realizability of V*3* HyperLTL is undecidable. 


Proof. The proof is done via reduction from Post’s Correspondence Problem 
(PCP) [22]. The basic idea follows the proof in [9]. 


4 Bounded Realizability 


We propose an algorithm to synthesize strategies from specifications given in 
universal HyperLTL by searching for finite generators of realizing strategies. We 
encode this search as a satisfiability problem for a decidable constraint system. 


Transition Systems. A transition system S is a tuple (S,s9,7,l) where S is a 
finite set of states, so € S is the designated initial state, r: S x 2! — S is the 
transition function, and l: S — 2° is the state-labeling or output function. We 
generalize the transition function to sequences over 2/ by defining 7*: (2/)* ^ 
S recursively as T*(€) = so and 7*(Wo-++Wn-1Wn) = T(T*(Wo-+: ws), Wn) 
for wgo--:: Ups 1ws € (9 yt. A transition system S generates the strategy f if 
f(w) = l(r*(w)) for every w € (2/)*. A strategy f is called finite-state if there 
exists a transition system that generates f. 


Overview. We first sketch the synthesis procedure and then proceed with a 
description of the intermediate steps. Let y be a universal HyperLTL formula 
Yri: Vra. Y. We build the automaton A, whose language is the set of tuples 
of traces that satisfy ~. We then define the acceptance of a transition system S 
on Ay by means of the self-composition of S. Lastly, we encode the existence of 
a transition system accepted by Ay as an SMT constraint system. 


Example 1. Throughout this section, we will use the following (simplified) run- 
ning example. Assume we want to synthesize a system that keeps decisions secret 
until it is allowed to publish. Thus, our system has three input signals decision, 
indicating whether a decision was made, the secret value, and a signal to publish 
results. Furthermore, our system has two outputs, a high output internal that 
stores the value of the last decision, and a low output result that indicates the 
result. No information about decisions should be inferred until publication. To 
specify the functionality, we propose the LTL specification 


(decision > (value — O internal)) 


^ O(-decision — (internal — O internal)) 
^ DX publish — O(internal — result)) . (2) 


The solution produced by the LTL synthesis tool BoSy [8], shown in Fig. 2, clearly 
violates our intention that results should be secret until publish: Whenever a 
decision is made, the output result changes as well. 

We formalize the property that no information about the decision can be 
inferred from result until publication as the HyperLTL formula 


Va Vn . (publish, V publish,,) 'R. (result, — result,;) . (3) 
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|. D pub A (dec V ^val) 


4 deo A wal dec ^ ^val ^ ^pub pub ^ (^dec V —^val) 


pub ^ aval 
SO S1 


{res, int} 0 dec ^ val 


dec ^ val 51 
C ial dec ^ pub ^ aval 
= =i res, inty [E 
—^dec V val dec V val Jac oval pub ^ —^val 


-^dec V val V ^dec ^ val 


(a) (b) 


Fig. 2. Synthesized solutions for Example 1. 


It asserts that for every pair of traces, the result signals have to be the same until 
(if ever) there is a publish signal on either trace. A solution satisfying both, the 
functional specification and the hyperproperty, is shown in Fig.2. The system 
switches states whenever there is a decision with a different value than before 
and only exposes the decision in case there is a prior publish command. 


We proceed with introducing the necessary preliminaries for our algorithm. 


Automata. A universal co-Büchi automaton A over a finite alphabet X is a tuple 
(Q, qo, 6, F}, where Q is a finite set of states, go € Q is the designated initial state, 
0:Qx27 xQ is the transition relation, and F C Q is the set of rejecting states. 
Given an infinite word o = 090103 -- € (27), a run of c on A is an infinite 
path qoqiqo::: € Q” where for all ¢ > 0 it holds that (qi, Ci, qj41) € 6. A run 
is accepting, if it contains only finitely many rejecting states. A accepts a word 
c, if all runs of o on A are accepting. The language of A, written L(A), is 
the set (o € (27)" | A accepts c]. We represent automata as directed graphs 
with vertex set Q and a symbolic representation of the transition relation 6 
as propositional boolean formulas B(X). The rejecting states in F are marked 
by double lines. The automata for the LTL and HyperLTL specifications from 
Example 1 are depicted in Fig. 3. 


Run Graph. The run graph of a transition system S = (S, so, T, l) on a universal 
co-Büchi automaton A = (Q, qo, 6, F) is a directed graph (V, E) where V = SxQ 
is the set of vertices and E C V x V is the edge relation with 

((s.4),(s',q')) € E iff 
Ji € 21. 3o € 2°. (r(s,i) = s’) A (Is) = 0) A(q,iU0,q') EÔ . 


A run graph is accepting if every path (starting at the initial vertex (so, qo)) has 
only finitely many visits of rejecting states. To show acceptance, we annotate 
every reachable node in the run graph with a natural number m, such that any 
path, starting in the initial state, contains less than m visits of rejecting states. 
Such an annotation exists if, and only if, the run graph is accepting [14]. 
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adec ^ aint adec ^ internal 


V dec ^ ^val V dec ^ val 
pub apub, ^ ^pub., 
^ TeS4 +> TES! 
(int + res) 
T€S« €^ TCS x! 


T T 


(a) Automaton accepting language defined (b) Automaton accepting language defined 
by LTL formula in (2) by HyperLTL formula in (3) 


Fig. 3. Universal co-Büchi automata recognizing the languages from Example 1. 


Self-composition. 'The model checking of universal HyperLTL formulas [12] is 
based on self-composition. Let prj; be the projection to the i-th element of a 
tuple. Let zip denote the usual function that maps a n-tuple of sequences to a sin- 
gle sequence of n-tuples, for example, zip([1, 2, 3], [4, 5, 6]) = [(1, 4), (2, 5), (3, 6)], 
and let unzip denote its inverse. The transition system S” is the n-fold self- 
composition of S = (S, so,T,l), if S" = (S",s§,7',1") and for all s,s’ € S", 

€ (27)", and 8 € (29)" we have that 7'(s,o) = s' and I"(s) = @ iff for all 
1 i € n, it hold that 7(prj,(s), prj;(a)) = prj;(s') and l(prj;(s)) = prj;(8). 
If T is the set of traces generated by S, then ([zip(ti,...,t4) | t4,..., t, € T] is 
the set of traces generated by S”. 

We construct the universal co-Büchi automaton Ay such that the language 
of Ay is the set of words w such that unzip(w) = II and II Fg v, i.e., the tuple of 
traces that satisfy v». We get this automaton by dualizing the non-deterministic 
Büchi automaton for ~y [4], i.e., changing the branching from non-deterministic 
to universal and the acceptance condition from Büchi to co-Büchi. Hence, S 
satisfies a universal HyperLTL formula y = Vm ... Vr. Y if the traces generated 
by self-composition S” are a subset of £(.A,). 


Lemma 3. A transition system S satisfies the universal HyperLTL formula yp = 
Vm s: Van. Y, if the run graph of S" and Ay is accepting. 


Synthesis. Let S = (S, so, 7, I) and Ay = (Q, qo, ô, F). We encode the synthesis 
problem as an SMT constraint system. Therefore, we use uninterpreted function 
symbols to encode the transition system and the annotation. For the transition 
system, those functions are the transition function T : S x 2! — S and the 
labeling function | : S — 2°. The annotation is split into two parts, a reachability 
constraint AP : S" x Q — B indicating whether a state in the run graph is 
reachable and a counter A* : S" x Q — N that maps every reachable vertex 
to the maximal number of rejecting states visited by any path starting in the 
initial vertex. The resulting constraint asserts that there is a transition system 
with accepting run graph. 
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Ys, s! € S". Vg,q € Q.Vi e (2^)*. 
(X 6.9) ^T (s,i) 5 s' A(q,iUU(s),q’) € ô) — AF (s',q') ^ A (s', q') © A (s, q) 


where > is > if q' € F and > otherwise. 


Theorem 6. The constraint system is satisfiable with bound b if, and only if, 
there is a transition system S of size b that realizes the HyperLTL formula. 


We extract a realizing implementation by asking the satisfiability solver to gen- 
erate a model for the uninterpreted functions that encode the transition system. 


5 Bounded Unrealizability 


So far, we focused on the positive case, providing an algorithm for finding small 
solutions, if they exist. In this section, we shift to the case of detecting if a 
universal HyperLTL formula is unrealizable. We adapt the definition of coun- 
terexamples to realizability for LTL [15] to HyperLTL in the following. Let y 
be a universal HyperLTL formula V7; :-- Vr. over inputs J and outputs O, 
a countererample to realizability is a set of input traces P C (27)" such that 
for every strategy f : (2/)* — 2° the labeled traces Pf C (219O)^ satisfy 
ay = 30: 304.2. 


Proposition 1. A universal HyperLTL formula p = Vs --- Vm. is unrealiz- 
able if there is a counterexample P to realizability. 


Proof. For contradiction, we assume y is realizable by a strategy f. As P is 
a counterexample to realizability, we know Pf FE 3m,---37,.—v. This means 
that there exists an assignment Hp € Y — Pf with Hp Fps —v. Equivalently 
Ip ¥pr 1). Therefore, not all assignments H € Y — P? satisfy II Ep; Y. Which 
implies P/ Æ Vr, --- Van. = q. Since o is universal, we can defer f ¥ p, which 
concludes the contradiction. Thus, y is unrealizable. 


Despite being independent of strategy trees, there are in many cases finite 
representations of P. Consider, for example, the unrealizable specification yı = 
Va Vn. (in > i), where the set Py = (07, [i] ^) is a counterexample to realiz- 
ability. As a second example, consider y2 = VaVr'.O(0n © or) AD (in > Oor) 
with conflicting requirements on o. Pı is a counterexample to realizability for 
3 as well: By choosing a different valuation of 7 in the first step, the system is 
forced to either react with different valuations of o (violating first conjunct), or 
not correctly repeating the initial value of i (violating second conjunct). 

There are, however, already linear specifications where the set of counterex- 
ample paths is not finite and depends on the strategy tree [16]. For example, the 
specification Yr. (i, — 04) is unrealizable as the system cannot predict future 
values of the environment. There is no finite set of traces witnessing this: For 
every finite set of traces, there is a strategy tree such that (i; + or) holds on 
every such trace. On the other hand, there is a simple counterezample strategy, 
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that is a strategy that observes output sequences and produces inputs. In this 
example, the counterexample strategy inverts the outputs given by the system, 
thus it is guaranteed that [(? + o) for any system strategy. 

We combine those two approaches, selecting counterexample paths and using 
strategic behavior. A k-counterexample strategy for HyperLTL observes k output 
sequences and produces k inputs, where k is a new parameter (k > n). The 
counterexample strategy is winning if (1) either the traces given by the system 
player do not correspond to a strategy, or (2) the body of the HyperLTL is 
violated for any n subset of the k traces. Regarding property (1), consider the 
two traces where the system player produces different outputs initially. Clearly, 
those two traces cannot be generated by any system strategy since the initial 
state (root labeling) is fixed. 

The search for a k-counterexample strategy can be reduced to LTL synthesis 
using k-tuple input propositions O^, k-tuple output propositions J*, and the 
specification 


Dik ok V V [P] E 
PCÍ1,...,k) with |P|2n 


where v|P] denotes the replacement of a4, by the P;th position of the combined 
input/output k-tuple. 


Theorem 7. A universal HyperLTL formula p = Vm --: Vra. is unrealizable 
if there is a k-counterezample strategy for some k > n. 


6 Evaluation 


We implemented a prototype synthesis tool, called BoSyHyper!, for universal 
HyperLTL based on the bounded synthesis algorithm described in Sect. 4. Fur- 
thermore, we implemented the search for counterexamples proposed in Sect. 5. 
'Thus, BoSyHyper is able to characterize realizability and unrealizability of uni- 
versal HyperLTL formulas. 

We base our implementation on the LTL synthesis tool BoSy [8]. For effi- 
ciency, we split the specifications into two parts, a part containing the linear 
(LTL) specification, and a part containing the hyperproperty given as HyperLTL 
formula. Consequently, we build two constraint systems, one using the standard 
bounded synthesis approach [14] and one using the approach described in Sect. 4. 
Before solving, those constraints are combined into a single SMT query. This 
results in a much more concise constraint system compared to the one where 
the complete specification is interpreted as a HyperLTL formula. For solving the 
SMT queries, we use the Z3 solver [20]. We continue by describing the bench- 
marks used in our experiments. 


1 BoSyHyper is available at https://www.react.uni-saarland.de/tools/bosy/. 
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Se tie tie 


(a) Non-symmetric solu- (b) Counterexample to (c) Symmetry breaking 
tion symmetry solution 


Fig. 4. Synthesized solution of the mutual exclusion protocols. 


Symmetric Mutual Exclusion. Our first example demonstrates the ability to 
specify symmetry in HyperLTL for a simple mutual exclusion protocol. Let 
rı and rz be input signals representing mutual exclusive requests to a criti- 
cal section and gi1/g» the respective grant to enter the section. Every request 
should be answered eventually Dr; — Ogi) for i € {1,2}, but not at the 
same time []—(gi ^ ga). The minimal LTL solution is depicted in Fig. 4a. 
It is well known that no mutex protocol can ensure perfect symmetry [19], 
thus when adding the symmetry constraint specified by the HyperLTL formula 
Vn Va. (Tig e» Tog) R (Gig € 9271) the formula becomes unrealizable. Our tool 
produces the counterexample shown in Fig. 4b. By adding another input signal 
tie that breaks the symmetry in case of simultaneous requests and modifying 
the symmetry constraint VzV7'. ((rig e Taq’) V (tier + atier))R (gig S Gon") 
we obtain the solution depicted in Fig. 4c. We further evaluated the same prop- 
erties on a version that forbids spurious grants, which are reported in Table 2 
with prefix full. 


Distributed and Fault-Tolerant Systems. In Sect.3 we presented a reduction of 
arbitrary distributed architectures to HyperLTL. As an example for our evalu- 
ation, consider a setting with two processes, one for encoding input signals and 
one for decoding. Both processes can be synthesized simultaneously using a sin- 
gle HyperLTL specification. The (linear) correctness condition states that the 
decoded signal is always equal to the inputs given to the encoder. Furthermore, 
the encoder and decoder should solely depend on the inputs and the encoded 
signal, respectively. Additionally, we can specify desired properties about the 
encoding like fault-tolerance [16] or Hamming distance of code words [12]. The 
results are reported in Table2 where i-;-r means i input bits, j encoded bits, 
and x represents the property. The property is either tolerance against a single 
Byzantine signal failure or a guaranteed Hamming distance of code words. 


CAP Theorem. The CAP Theorem due to Brewer [2] states that it is impossi- 
ble to design a distributed system that provides Consistency, Availability, and 
Partition tolerance (CAP) simultaneously. This example has been considered 
before [16] to evaluate a technique that could automatically detect unrealizabil- 
ity. However, when we drop either Consistency, Availability, or Partition toler- 
ance, the corresponding instances (AP, CP, and CA) become realizable, which 
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the previous work was not able to prove. We show that our implementation 
can show both, unrealizability of CAP and realizability of AP, CP, and CA. In 
contrast to the previous encoding [16] we are not limited to acyclic architectures. 


Long-term Information-flow. Previous work on model-checking hyperproper- 
ties [12] found that an implementation for the commonly used 72C bus protocol 
could remember input values ad infinitum. For example, it could not be veri- 
fied that information given to the implementation eventually leaves it, i.e., is 
forgotten. This is especially unfortunate in high security contexts. We consider 
a simple bus protocol which is inspired by the widely used /2C protocol. Our 
example protocol has the inputs send for initiating a transmission, in for the 
value that should be transferred, and an acknowledgment bit indicating success- 
ful transmission. The bus master waits in an idle state until a send is received. 
Afterwards, it transmits a header sequence, followed by the value of in, waits for 
an acknowledgement and then indicates success or failure to the sender before 
returning to the idle state. We specify the property that the input has no influ- 
ence on the data that is send, which is obviously violated (instance NI1). As a 
second property, we check that this information leak cannot happen arbitrary 
long (NI2) for which there is a realizing implementation. 


Dining Cryptographers. Recap the dining cryptographers problem introduced 
earlier. This benchmark is interesting as it contains two types of hyperproper- 
ties. First, there is information-flow between the three cryptographers, where 
some secrets (Sab, Sac; Sbc) are shared between pairs of cryptographers. In the 
formalization, we have 4 entities: three processes describing the 3 cryptogra- 
phers (out;) and one process computing the result (pj), i.e., whether the group 
has paid or not, from out;. Second, the final result should only disclose whether 
one of the cryptographers has paid or the NSA. This can be formalized as a 
indistinguishability property between different executions. For example, when 
we compare the two traces 7 and 7’ where Ca has paid on m and C has paid 
on 7’. Then the outputs of both have to be the same, if their common secret 
Sab is different on those two traces (while all other secrets Sac and sp. are the 
same). This ensures that from an outside observer, a flipped output can be either 
result of a different shared secret or due to the announcement. Lastly, the linear 
specification asserts that pj «+ —pusa. 


Results. Table2 reports on the results of the benchmarks. We distinguish 
between state-labeled (Moore) and transition-labeled (Mealy) transition sys- 
tems. Note that the counterexample strategies use the opposite transition sys- 
tem, ie., a Mealy system strategy corresponds to a state-labeled (Moore) 
environment strategy. Typically, Mealy strategies are more compact, i.e., need 
smaller transition systems and this is confirmed by our experiments. BoSyHyper 
is able to solve most of the examples, providing realizing implementations or 
counterexamples. Regrading the unrealizable benchmarks we observe that usu- 
ally two simultaneously generated paths (k — 2) are enough with the exception 
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Table 2. Results of BoSyHyper on the benchmarks sets described in Sect. 6. They ran 
on a machine with a dual-core Core i7, 3.3 GHz, and 16 GB memory. 


Benchmark Instance Result States Time[sec.] 
Moore | Mealy | Moore | Mealy 
Symmetric Mutex non-sym realizable 2 1.4 3 
sym unrealizable (k = 2) | 1 1 1.9 2.0 
tie realizable 3 3 LT 6 
full-non-sym realizable 4 4 1.4 A 
full-sym unrealizable (k = 2) | 1 1 4.3 6.2 
full-tie realizable 9 5 1802.7 | 5.2 
Encoder /Decoder 1-2-hamming-2 realizable 4 1 1.6 3 
1-2-fault-tolerant | unrealizable (k = 2) | 1 - 54.9 - 
1-3-fault-tolerant | realizable 4 1 151.7 7 
2-2-hamming-2 unrealizable (k = 3 1 - 0.6 
2-3-hamming-2 realizable 16 1 »1h 5 
2-3-hamming-3 unrealizable (k = 3) | - 1 - 26.7 
CAP Theorem cap-2-linear realizable 8 1 7.0 3 
cap-2 unrealizable (k = 2) | 1 - 1823.9 | - 
ca-2 realizable - J - 4.4 
ca-3 realizable - 1 - 5.0 
cp-2 realizable 1 1 1.8 6 
cp-3 realizable 1 1 3.2 0.6 
ap-2 realizable - 1 - 2.0 
ap-3 realizable - 1 - 43.4 
Bus Protocol NI1 unrealizable (k = 2) 1 75.2 69.6 
NI2 realizable 8 8 24.1 33.9 
Dining Cryptographers | secrecy realizable - 1 - 82.4 


of the encoder example. Overall the results are encouraging showing that we can 
solve a variety of instances with non-trivial information-flow. 


7 Conclusion 


In this paper, we have considered the reactive realizability problem for specifica- 
tions given in the temporal logic HyperLTL. We gave a complete characterization 
of the decidable fragments based on the quantifier prefix and, additionally, iden- 
tified a decidable fragment in the, in general undecidable, universal fragment of 
HyperLTL. Furthermore, we presented two algorithms to detect realizable and 
unrealizable HyperLTL specifications, one based on bounding the system imple- 
mentation and one based on bounding the number of counterexample paths. Our 
prototype implementation shows that our approach is able to synthesize systems 
with complex information-flow properties. 


Synthesizing Reactive Systems from Hyperproperties 305 


References 


10. 


11. 


12. 


13. 


14. 


15. 


16. 


Ie. 


. Agrawal, S., Bonakdarpour, B.: Runtime verification of k-safety hyperproperties in 


HyperLTL. In: Proceedings of CSF, pp. 239-252. IEEE Computer Society (2016) 
Brewer, E.A.: Towards robust distributed systems (abstract). In: Proceedings of 
ACM, p. 7. ACM (2000) 

Chaum, D.: Security without identification: transaction systems to make big 
brother obsolete. Commun. ACM 28(10), 1030-1044 (1985) 

Clarkson, M.R., Finkbeiner, B., Koleini, M., Micinski, K.K., Rabe, M.N., Sanchez, 
C.: Temporal logics for hyperproperties. In: Abadi, M., Kremer, S. (eds.) POST 
2014. LNCS, vol. 8414, pp. 265-284. Springer, Heidelberg (2014). https://doi.org/ 
10.1007 /978-3-642-54792-8 15 

Clarkson, M.R., Schneider, F.B.: Hyperproperties. J. Comput. Secur. 18(6), 1157— 
1210 (2010) 

Dimitrova, R., Finkbeiner, B., Kovacs, M., Rabe, M.N., Seidl, H.: Model check- 
ing information flow in reactive systems. In: Kuncak, V., Rybalchenko, A. (eds.) 
VMCAI 2012. LNCS, vol. 7148, pp. 169-185. Springer, Heidelberg (2012). https:// 
doi.org/10.1007/978-3-642-27940-9.12 

Faymonville, P., Finkbeiner, B., Rabe, M.N., Tentrup, L.: Encodings of bounded 
synthesis. In: Legay, A., Margaria, T. (eds.) TACAS 2017. LNCS, vol. 10205, pp. 
354-370. Springer, Heidelberg (2017). https: //doi.org/10.1007/978-3-662-54577- 
5.20 

Faymonville, P., Finkbeiner, B., Tentrup, L.: BoSy: an experimentation framework 
for bounded synthesis. In: Majumdar, R., Kunéak, V. (eds.) CAV 2017. LNCS, 
vol. 10427, pp. 325-332. Springer, Cham (2017). https://doi.org/10.1007/978-3- 
319-63390-9_17 

Finkbeiner, B., Hahn, C.: Deciding hyperproperties. In: Proceedings of CONCUR. 
LIPIcs, vol. 59, pp. 13:1-13:14. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik 
(2016) 

Finkbeiner, B., Hahn, C., Stenger, M., Tentrup, L.: Monitoring hyperproperties. 
In: Lahiri, S., Reger, G. (eds.) RV 2017. LNCS, vol. 10548, pp. 190-207. Springer, 
Cham (2017). https: //doi.org/10.1007/978-3-319-67531-2_12 

Finkbeiner, B., Jacobs, S.: Lazy synthesis. In: Kuncak, V., Rybalchenko, A. (eds.) 
VMCAI 2012. LNCS, vol. 7148, pp. 219-234. Springer, Heidelberg (2012). https:// 
doi.org/10.1007/978-3-642-27940-9 15 

Finkbeiner, B., Rabe, M.N., Sánchez, C.: Algorithms for model checking Hyper- 
LTL and HyperCTL*. In: Kroening, D., Pásáreanu, C.S. (eds.) CAV 2015. LNCS, 
vol. 9206, pp. 30-48. Springer, Cham (2015). https://doi.org/10.1007/978-3-319- 
21690-4_3 

Finkbeiner, B., Schewe, S.: Uniform distributed synthesis. In: Proceedings of LICS, 
pp. 321-330. IEEE Computer Society (2005) 

Finkbeiner, B., Schewe, S.: Bounded synthesis. STTT 15(5-6), 519-539 (2013) 
Finkbeiner, B., Tentrup, L.: Detecting unrealizable specifications of distributed 
systems. In: Ábrahám, E., Havelund, K. (eds.) TACAS 2014. LNCS, vol. 8413, pp. 
78-92. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-642-54862-8_6 
Finkbeiner, B., Tentrup, L.: Detecting unrealizability of distributed fault-tolerant 
systems. Log. Methods Comput. Sci. 11(3) (2015) 

Finkbeiner, B., Zimmermann, M.: The first-order logic of hyperproperties. In: Pro- 
ceedings of STACS. LIPIcs, vol. 66, pp. 30:1-30:14. Schloss Dagstuhl - Leibniz- 
Zentrum fuer Informatik (2017) 


306 B. Finkbeiner et al. 


18. Kupferman, O., Vardi, M.Y.: Safraless decision procedures. In: Proceedings of 
FOCS, pp. 531-542. IEEE Computer Society (2005) 

19. Manna, Z., Pnueli, A.: Temporal Verification of Reactive Systems - Safety. Springer, 
New York (1995). https://doi.org/10.1007/978-1-4612-4222-2 

20. de Moura, L., Bjørner, N.: Z3: an efficient SMT solver. In: Ramakrishnan, C.R., 
Rehof, J. (eds.) TACAS 2008. LNCS, vol. 4963, pp. 337-340. Springer, Heidelberg 
(2008). https: //doi.org/10.1007/978-3-540-78800-3_24 

21. Pnueli, A., Rosner, R.: Distributed reactive systems are hard to synthesize. In: 
Proceedings of FOCS, pp. 746-757. IEEE Computer Society (1990) 

22. Post, E.L.: A variant of a recursively unsolvable problem. Bull. Am. Math. Soc. 
52(4), 264-268 (1946) 


Open Access This chapter is licensed under the terms of the Creative Commons 
Attribution 4.0 International License (http://creativecommons.org/licenses/by /4.0/), 
which permits use, sharing, adaptation, distribution and reproduction in any medium 
or format, as long as you give appropriate credit to the original author(s) and the 
source, provide a link to the Creative Commons license and indicate if changes were 
made. 

The images or other third party material in this chapter are included in the chapter’s 
Creative Commons license, unless indicated otherwise in a credit line to the material. If 
material is not included in the chapter’s Creative Commons license and your intended 
use is not permitted by statutory regulation or exceeds the permitted use, you will 
need to obtain permission directly from the copyright holder. 


m 


Check for 
updates 


Reactive Control Improvisation 


Daniel J. Fremont (?9$ and Sanjit A. Seshia® 


University of California, Berkeley, USA 
{dfremont ,sseshia}@berkeley.edu 


Abstract. Reactive synthesis is a paradigm for automatically build- 
ing correct-by-construction systems that interact with an unknown or 
adversarial environment. We study how to do reactive synthesis when 
part of the specification of the system is that its behavior should be 
random. Randomness can be useful, for example, in a network protocol 
fuzz tester whose output should be varied, or a planner for a surveillance 
robot whose route should be unpredictable. However, existing reactive 
synthesis techniques do not provide a way to ensure random behavior 
while maintaining functional correctness. Towards this end, we general- 
ize the recently-proposed framework of control improvisation (CI) to add 
reactivity. The resulting framework of reactive control improvisation pro- 
vides a natural way to integrate a randomness requirement with the usual 
functional specifications of reactive synthesis over a finite window. We 
theoretically characterize when such problems are realizable, and give a 
general method for solving them. For specifications given by reachability 
or safety games or by deterministic finite automata, our method yields a 
polynomial-time synthesis algorithm. For various other types of specifi- 
cations including temporal logic formulas, we obtain a polynomial-space 
algorithm and prove matching PSPACE-hardness results. We show that 
all of these randomized variants of reactive synthesis are no harder in a 
complexity-theoretic sense than their non-randomized counterparts. 


1 Introduction 


Many interesting programs, including protocol handlers, task planners, and con- 
current software generally, are open systems that interact over time with an 
external environment. Synthesis of such reactive systems requires finding an 
implementation that satisfies the desired specification no matter what the envi- 
ronment does. This problem, reactive synthesis, has a long history (see [7] for 
a survey). Reactive synthesis from temporal logic specifications [19] has been 
particularly well-studied and is being increasingly used in applications such as 
hardware synthesis [3] and robotic task planning [15]. 

In this paper, we investigate how to synthesize reactive systems with random 
behavior: in fact, systems where being random in a prescribed way is part of 
their specification. This is in contrast to prior work on stochastic games where 
randomness is used to model uncertain environments or randomized strategies 
are merely allowed, not required. Solvers for stochastic games may incidentally 
produce randomized strategies to satisfy a functional specification (and some 


(€ The Author(s) 2018 
H. Chockler and G. Weissenbacher (Eds.): CAV 2018, LNCS 10981, pp. 307-326, 2018. 
https://doi.org/10.1007/978-3-319-96145-3 17 


308 D. J. Fremont and S. A. Seshia 


types of specification, e.g. multi-objective queries [4], may only be realizable by 
randomized strategies), but do not provide a general way to enforce randomness. 
Unlike most specifications used in reactive synthesis, our randomness require- 
ment is a property of a system’s distribution of behaviors, not of an individual 
behavior. While probabilistic specification languages like PCTL [12] can cap- 
ture some such properties, the simple and natural randomness requirement we 
study here cannot be concisely expressed by existing languages (even those as 
powerful as SGL [2]). Thus, randomized reactive synthesis in our sense requires 
significantly different methods than those previously studied. 

However, we argue that this type of synthesis is quite useful, because intro- 
ducing randomness into the behavior of a system can often be beneficial, enhanc- 
ing variety, robustness, and unpredictability. Example applications include: 


— Synthesizing a black-box fuzz tester for a network service, we want a program 
that not only conforms to the protocol (perhaps only most of the time) but 
can generate many different sequences of packets: randomness ensures this. 

— Synthesizing a controller for a robot exploring an unknown environment, ran- 
domness provides a low-memory way to increase coverage of the space. It can 
also help to reduce systematic bias in the exploration procedure. 

— Synthesizing a controller for a patrolling surveillance robot, introducing ran- 
domness in planning makes the robot’s future location harder to predict. 


Adding randomness to a system in an ad hoc way could easily compromise its 
correctness. This paper shows how a randomness requirement can be integrated 
into the synthesis process, ensuring correctness as well as allowing trade-offs to 
be explored: how much randomness can be added while staying correct, or how 
strong can a specification be while admitting a desired amount of randomness? 

To formalize randomized reactive synthesis we build on the idea of control 
improvisation, introduced in [6], formalized in [9], and further generalized in [8]. 
Control improvisation (CI) is the problem of constructing an improviser, a prob- 
abilistic algorithm which generates finite words subject to three constraints: a 
hard constraint that must always be satisfied, a soft constraint that need only 
be satisfied with some probability, and a randomness constraint that no word be 
generated with probability higher than a given bound. We define reactive control 
improvisation (RCI), where the improviser generates a word incrementally, alter- 
nating adding symbols with an adversarial environment. To perform synthesis in 
a finite window, we encode functional specifications and environment assump- 
tions into the hard constraint, while the soft and randomness constraints allow 
us to tune how randomness is added to the system. The improviser obtained by 
solving the RCI problem is then a solution to the original synthesis problem. 

The difficulty of solving reactive CI problems depends on the type of speci- 
fication. We study several types commonly used in reactive synthesis, including 
reachability games (and variants, e.g. safety games) and formulas in the tem- 
poral logics LTL and LDL [5,18]. We also investigate the specification types 
studied in [8], showing how the complexity of the CI problem changes when 
adding reactivity. For every type of specification we obtain a randomized syn- 
thesis algorithm whose complexity matches that of ordinary reactive synthesis 
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(in a finite window). This suggests that reactive control improvisation should be 
feasible in applications like robotic task planning where reactive synthesis tools 
have proved effective. 

In summary, the main contributions of this paper are: 


— The reactive control improvisation (RCI) problem definition (Sect. 3); 


— The notion of width, a quantitative generalization of “winning” game positions 
that measures how many ways a player can win from that position (Sect. 4); 


— A characterization of when RCI problems are realizable in terms of width, 
and an explicit construction of an improviser (Sect. 4); 


— A general method for constructing efficient improvisation schemes (Sect. 5); 


— A polynomial-time improvisation scheme for reachability/safety games and 
deterministic finite automaton specifications (Sect. 6); 


— PSPACE-hardness results for many other specification types including tem- 
poral logics, and matching polynomial-space improvisation schemes (Sect. 7). 


Finally, Sect. 8 summarizes our results and gives directions for future work. 


2 Background 


2.1 Notation 


Given an alphabet X, we write |w] for the length of a finite word w € X*, A for the 
empty word, £” for the words of length n, and £S” for Uo<i<nd", the set of all 
words of length at most n. We abbreviate deterministic/nondeterministic finite 
automaton by DFA/NFA, and context-free grammar by CFG. For an instance 
X of any such formalism, which we call a specification, we write L(A) for the 
language (subset of X) it defines (note the distinction between a language and 
a representation thereof). We view formulas of Linear Temporal Logic (LTL) 
[18] and Linear Dynamic Logic (LDL) [5] as specifications using their natural 
semantics on finite words (see [5]). 

We use the standard complexity classes ##P and PSPACE, and the PSPACE- 
complete problem QBF of determining the truth of a quantified Boolean for- 
mula. For background on these classes and problems see for example [1]. 

Some specifications we use as examples are reachability games [16], where 
players! actions cause transitions in a state space and the goal is to reach a 
target state. We group these games, safety games where the goal is to avoid 
a set of states, and reach-avoid games combining reachability and safety goals 
[20], together as reachability/safety games (RSGs). We draw reachability games 
as graphs in the usual way: squares are adversary-controlled states, and states 
with a double border are target states. 
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2.2 Synthesis Games 


Reactive control improvisation will be formalized in terms of a 2-player game 
which is essentially the standard synthesis game used in reactive synthesis [7]. 
However, our formulation is slightly different for compatibility with the definition 
of control improvisation, so we give a self-contained presentation here. 

Fix a finite alphabet X. The players of the game will alternate picking symbols 
from X, building up a word. We can then specify the set of winning plays with 
a language over X. To simplify our presentation we assume that players strictly 
alternate turns and that any symbol from X is a legal move. These assumptions 
can be relaxed in the usual way by modifying the winning set appropriately. 


Finite Words: While reactive synthesis is usually considered over infinite 
words, in this paper we focus on synthesis in a finite window, as it is unclear 
how best to generalize our randomness requirement to the infinite case. This 
assumption is not too restrictive, as solutions of bounded length are adequate 
for many applications. In fuzz testing, for example, we do not want to gener- 
ate arbitrarily long files or sequences of packets. In robotic planning, we often 
want a plan that accomplishes a task within a certain amount of time. Fur- 
thermore, planning problems with liveness specifications can often be segmented 
into finite pieces: we do not need an infinite route for a patrolling robot, but 
can plan within a finite horizon and replan periodically. Replanning may even 
be necessary when environment assumptions become invalid. At any rate, we 
will see that the bounded case of reactive control improvisation is already highly 
nontrivial. 

As a final simplification, we require that all plays have length exactly n € N. 
To allow a range [m,n] we can simply add a new padding symbol to © and 
extend all shorter words to length n, modifying the winning set appropriately. 


Definition 2.1. A history h is an element of SS", representing the moves of 
the game played so far. We say the game has ended after h if |h| = n; otherwise 
it is our turn after h if |h| is even, and the adversary's turn if |h| is odd. 


Definition 2.2. A strategy is a function a : US" x Xi ^ [0,1] such that for any 
history h € X" with |h| < n, o(h,-) is a probability distribution over X. We 
write x — c(h) to indicate that x is a symbol randomly drawn from o(h, -). 


Since strategies are randomized, fixing strategies for both players does not 
uniquely determine a play of the game, but defines a distribution over plays: 


Definition 2.3. Given a pair of strategies (c, T), we can generate a random 
play m € X" as follows. Pick no — o(A), then for i from 1 to n — 1 pick 
Ti — T(To...Ti-1) if i is odd and mj — o(mo...mi-1) otherwise. Finally, put 
T = T9... tn-1. We write P(T) for the probability of obtaining the play m. This 
extends to a set of plays X C X" in the natural way: Po (X) = X rex Po,r(m). 
Finally, the set of possible plays is II. = (x € X" | P5 (1) > Of. 


Reactive Control Improvisation 311 


The next definition is just the conditional probability of a play given a history, 
but works for histories with probability zero, simplifying our presentation. 


Definition 2.4. For any history h = ho... hp_1 € SS" and word p € X"-*, we 
write P5 -(p|h) for the probability that if we assign m; = h; for i < k and sample 
Tky-++;Tn—1 by the process above, then my ... T4 1 = p. 


3 Problem Definition 


3.1 Motivating Example 


Consider synthesizing a planner for a surveillance drone operating near another, 
potentially adversarial drone. Discretizing the map into the 7 x 7 grid in Fig. 1 
(ignoring the depicted trajectories for the moment), a route is a word over the 
four movement directions. Our specification is to visit the 4 circled locations in 
30 moves without colliding with the adversary, assuming it cannot move into the 
5 highlighted central locations. 


Fig. 1. Improvised trajectories for a patrolling drone (solid) avoiding an adversary 
(dashed). The adversary may not move into the circles or the square. 


Existing reactive synthesis tools can produce a strategy for the patroller 
ensuring that the specification is always satisfied. However, the strategy may be 
deterministic, so that in response to a fixed adversary the patroller will always 
follow the same route. Then it is easy for a third party to predict the route, 
which could be undesirable, and is in fact unnecessary if there are many other 
ways the drone can satisfy its specification. 

Reactive control improvisation addresses this problem by adding a new type 
of specification to the hard constraint above: a randomness requirement stating 
that no behavior should be generated with probability greater than a threshold 
p. If we set (say) p = 1/5, then any controller solving the synthesis problem must 
be able to satisfy the hard constraint in at least 5 different ways, never producing 
any given behavior more than 2096 of the time. Our synthesis algorithm can in 
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fact compute the smallest p for which synthesis is possible, yielding a controller 
that is maximally-randomized in that the system's behavior is as close to a 
uniform distribution as possible. 

To allow finer tuning of how randomness is introduced into the controller, 
our definition also includes a soft constraint which need only be satisfied with 
some probability 1 — e. This allows us to prefer certain safe behaviors over others. 
In our drone example, we require that with probability at least 3/4, we do not 
visit a circled location twice. 

'These hard, soft, and randomness constraints form an instance of our reactive 
control improvisation problem. Encoding the hard and soft constraints as DFAs, 
our algorithm (Sect.6) produced a controller achieving the smallest realizable 
p = 2.2 x 1071”. We tested the controller using the PX4 autopilot [17] to refine 
the generated routes into control actions for a drone simulated in Gazebo [14] 
(videos and code are available online [11]). A selection of resulting trajectories 
are shown in Fig. 1 (the remainder in Appendix A of the full paper [10] ): starting 
from the triangles, the patroller's path is solid, the adversary's dashed. The left 
run uses an adversary that moves towards the patroller when possible. The right 
runs, with a simple adversary moving in a fixed loop, illustrate the randomness 
of the synthesized controller. 


3.2 Reactive Control Improvisation 


Our formal notion of randomized reactive synthesis in a finite window is a reac- 
tive extension of control improvisation [8,9], which captures the three types of 
constraint (hard, soft, randomness) seen above. We use the notation of [8] for 
the specifications and languages defining the hard and soft constraints: 


Definition 3.1 ([8]). Given hard and soft specifications H and S of languages 
over 3, an improvisation is a word w € L(H)N&”. It is admissible if w € L(S). 
The set of all improvisations is denoted I, and admissible improvisations A. 


Running Example. We will use the following simple example throughout the 
paper: each player may increment (+), decrement (—), or leave unchanged (=) 
a counter which is initially zero. The alphabet is X = {+,—,=}, and we set 
n = 4. The hard specification H is the DFA in Fig. 2 requiring that the counter 
stay within [—2,2]. The soft specification S is a similar DFA requiring that the 
counter end at a nonnegative value. 

Then for example the word ++ == is an admissible improvisation, satisfying 
both hard and soft constraints, and so is in A. The word +—=— on the other 
hand satisfies H but not S, so it is in J but not A. Finally, +++— does not 
satisfy H, so it is not an improvisation at all and is not in J. 


A reactive control improvisation problem is defined by H, S, and parameters 
c and p. A solution is then a strategy which ensures that the hard, soft, and 
randomness constraints hold against every adversary. Formally, following [8,9]: 
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Fig. 2. The hard specification DFA # in our running example. The soft specification 
S is the same but with only the shaded states accepting. 


Definition 3.2. Given an RCI instance C = (4,S,n,e, p) with H, S, and n as 
above and e, p € [0,1] OQ, a strategy o is an improvising strategy if it satisfies 
the following requirements for every adversary T: 


Hard constraint: P, (I) —1 
Soft constraint: P,,(4)21-—«€ 
Randomness: Yr € I, P5, (7) € p. 


If there is an improvising strategy c, we say that C is realizable. An improviser 
for C is then an expected-finite time probabilistic algorithm implementing such a 
strategy o, i.e. whose output distribution on input h € SS" is o(h,-). 


Definition 3.3. Given an RCT instance C = (H,S,n,e, p), the reactive control 
improvisation (RCI) problem is to decide whether C is realizable, and if so to 
generate an improviser for C. 


Running Example. Suppose we set € = 1/2 and p = 1/2. Let c be the strategy 
which picks 4- or — with equal probability in the first move, and thenceforth picks 
the action which moves the counter closest to +1 respectively. This satisfies 
the hard constraint, since if the adversary ever moves the counter to +2 we 
immediately move it back. T'he strategy also satisfies the soft constraint, since 
with probability 1/2 we set the counter to +1 on the first move, and if the 
adversary moves to 0 we move back to +1 and remain nonnegative. Finally, o 
also satisfies the randomness constraint, since each choice of first move happens 
with probability 1/2 and so no play can be generated with higher probability. 
So o is an improvising strategy and this RCI instance is realizable. 


We will study classes of RCI problems with different types of specifications: 


Definition 3.4. /f HSPEC and SSPEC are classes of specifications, then the 
class of RCI instances C = (TH, S,n,e, p) where H € HSPEC and S € SSPEC 
is denoted RCI (HSPEC, SSPEC). We use the same notation for the decision 
problem associated with the class, i.e., given C € RCI (HSPEC, SSPEC), decide 
whether C is realizable. The size |C| of an RCT instance is the total size of the bit 
representations of its parameters, with n represented in unary and e, p in binary. 


Finally, a synthesis algorithm in our context takes a specification in the form 
of an RCI instance and produces an implementation in the form of an improviser. 
This corresponds exactly to the notion of an improvisation scheme from [8]: 
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Definition 3.5 ([8]). A polynomial-time improvisation scheme for a class P 
of RCT instances is an algorithm S with the following properties: 


Correctness: For any C € P, if C is realizable then S(C) is an improviser for 
C, and otherwise S(C) = L. 

Scheme efficiency: There is a polynomial p : IR — R such that the runtime of 
S on any C € P is at most p(|C]). 

Improviser efficiency: There is a polynomial q : R — R such that for every 
CEP, ifG=S(C) Z L then G has expected runtime at most q(|C|). 


'The first two requirements simply say that the scheme produces valid impro- 
visers in polynomial time. The third is necessary to ensure that the improvisers 
themselves are efficient: otherwise, the scheme might for example produce impro- 
visers running in time exponential in the size of the specification. 

A main goal of our paper is to determine for which types of specifications 
there exist polynomial-time improvisation schemes. While we do find such algo- 
rithms for important classes of specifications, we will also see that determining 
the realizability of an RCI instance is often PSPACE-hard. Therefore we also 
consider polynomial-space improvisation schemes, defined as above but replac- 
ing time with space. 


4 Existence of Improvisers 


4.1 Width and Realizability 


The most basic question in reactive synthesis is whether a specification is real- 
izable. In randomized reactive synthesis, the question is more delicate because 
the randomness requirement means that it is no longer enough to ensure some 
property regardless of what the adversary does: there must be many ways to do 
so. Specifically, there must be at least 1/p improvisations if we are to generate 
each of them with probability at most p. Furthermore, at least this many impro- 
visations must be possible given an unknown adversary: even if many exist, the 
adversary may be able to force us to use only a single one. We introduce a new 
notion of the size of a set of plays that takes this into account. 


Definition 4.1. The width of X C X" is W(X) = max, min, |X II. . 


The width counts how many distinct plays can be generated regardless of 
what the adversary does. Intuitively, a “narrow” game—one whose set of winning 
plays has small width—is one in which the adversary can force us to choose 
among only a few winning plays, while in a “wide” one we always have many 
safe choices available. Note that which particular plays can be generated depends 
on the adversary: the width only measures how many can be generated. For 
example, W(X) = 1 means that a play in X can always be generated, but 
possibly a different element of X for different adversaries. 
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Fig. 3. Synthesis game for our running example. States are labeled with the widths of 
I (left) and A (right) given a history ending at that state. 


Running Example. Figure 3 shows the synthesis game for our running example: 
paths ending in circled or shaded states are plays in I or A respectively (ignore 
the state labels for now). At left, the bold arrows show the 4 plays in J possible 
against the adversary that moves away from 0, and down at 0. This shows 
W (1) < 4, and in fact 4 plays are possible against any adversary, so W (I) = 4. 
Similarly, at right we see that W(A) = 1. 


It will be useful later to have a relative version of width that counts how 
many plays are possible from a given position: 


Definition 4.2. Given a set of plays X C X" and a history h € SS", the width 
of X given h is W(X|h) = max, min, |{r | ha € X ^ P5 (u|h) > Of). 


This is a direct generalization of “winning” positions: if X is the set of winning 
plays, then W(.X|h) counts the number of ways to win from h. 

We will often use the following basic properties of W(X|h) without comment 
(for lack of space this proof and the details of later proof sketches are deferred 
to Appendix B of the full paper [10]). Note that (3)-(5) provide a recursive way 
to compute widths that we will use later, and which is illustrated by the state 
labels in Fig. 3. 


Lemma 4.1. For any set of plays X C ©” and history h € DS": 


. 0 < W(X|h) < BI; 

. W(X|A) = W(X); 

. if |h| =n, then W(X|h) = Tnex; 

. if it is our turn after h, then W(X|h) = X ues W(X hu); 

. if it is the adversary’s turn after h, then W(X|h) = minyes W(X|hu). 


C 4. C5 t 


Now we can state the realizability conditions, which are simply that / and A 
have sufficiently large width. In fact, the conditions turn out to be exactly the 
same as those for non-reactive CI except that width takes the place of size [9]. 
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Theorem 4.1. The following are equivalent: 


(1) C is realizable. 
(2) W(I) > 1/p and W(A) (1 — 9». 
(3) There is an improviser for C. 


Running Example. We saw above that our example was realizable with e — p — 
1/2, and indeed 4 = W(I) > 1/p = 2 and 1 = W(A) > (1—6)/p = 1. However, if 
we put p = 1/3 we violate the second inequality and the instance is not realizable: 
essentially, we need to distribute probability 1 — € = 1/2 among plays in A (to 
satisfy the soft constraint), but since W(A) = 1, against some adversaries we can 
only generate one play in A and would have to give it the whole 1/2 (violating 
the randomness requirement). 


The difficult part of the Theorem is constructing an improviser when the 
inequalities (2) hold. Despite the similarity in these conditions to the non- 
reactive case, the construction is much more involved. We begin with a general 
overview. 


4.2 Improviser Construction: Discussion 


Our improviser can be viewed as an extension of the classical random-walk reduc- 
tion of uniform sampling to counting [21]. In that algorithm (which was used 
in a similar way for DFA specifications in [8,9]), a uniform distribution over 
paths in a DAG is obtained by moving to the next vertex with probability pro- 
portional to the number of paths originating at it. In our case, which plays are 
possible depends on the adversary, but the width still tells us how many plays 
are possible. So we could try a random walk using widths as weights: e.g. on 
the first turn in Fig. 3, picking +, —, and = with probabilities 1/4, 2/4, and 1/4 
respectively. Against the adversary shown in Fig.3, this would indeed yield a 
uniform distribution over the four possible plays in J. 

However, the soft constraint may require a non-uniform distribution. In the 
running example with « = p = 1/2, we need to generate the single possible 
play in A with probability 1/2, not just the uniform probability 1/4 . This is 
easily fixed by doing the random walk with a weighted average of the widths 
of I and A: specifically, move to position h with probability proportional to 
aW(A|R) + 8(W(I|h) — W(A|h)). In the example, this would result in plays 
in A getting probability o and those in J V A getting probability 8. Taking a 
sufficiently large, we can ensure the soft constraint is satisfied. 

Unfortunately, this strategy can fail if the adversary makes more plays avail- 
able than the width guarantees. Consider the game on the left of Fig. 4, where 
W(I) = 3 and W(A) = 2. This is realizable with e = p = 1/3, but no values of a 
and 8 yield improvising strategies, essentially because an adversary moving from 
X to Z breaks the worst-case assumption that the adversary will minimize the 
number of possible plays by moving to Y. In fact, this instance is realizable but 
not by any memoryless strategy. To see this, note that all such strategies can be 
parametrized by the probabilities p and q in Fig.4. To satisfy the randomness 
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Fig. 4. Reachability games where a naive random walk, and all memoryless strategies, 
fail (left) and where no strategy can optimize either e or p against every adversary 
simultaneously (right). 


constraint against the adversary that moves from X to Y, both p and (1 — p)q 
must be at most 1/3. To satisfy the soft constraint against the adversary that 
moves from X to Z we must have pq + (1 — p)q > 2/3, so q > 2/3. But then 
(1 — p)q > (1 — 1/3)(2/3) = 4/9 > 1/3, a contradiction. 

To fix this problem, our improvising strategy 6 (which we will fully specify 
in Algorithm 1 below) takes a simplistic approach: it tracks how many plays 
in A and I are expected to be possible based on their widths, and if more are 
available it ignores them. For example, entering state Z from X there are 2 ways 
to produce a play in J, but since W(I|X) = 1 we ignore the play in 7X A. Extra 
plays in A are similarly ignored by being treated as members of J \ A. Ignoring 
unneeded plays may seem wasteful, but the proof of Theorem 4.1 will show that 
o nevertheless achieves the best possible e: 


Corollary 4.1. C is realizable iff W(I) > 1/p and € > cop = max(1l-— 
pW(A),0). Against any adversary, the error probability of Algorithm 1 is at 
most Eopt- 


Thus, if any improviser can achieve an error probability e, ours does. We could 
ask for a stronger property, namely that against each adversary the improviser 
achieves the smallest possible error probability for that adversary. Unfortunately, 
this is impossible in general. Consider the game on the right in Fig. 4, with p = 1. 
Against the adversary which always moves up, we can achieve e — 0 with the 
strategy that at P moves to Q. We can also achieve e = 0 against the adversary 
that always moves down, but only with a different strategy, namely the one 
that at P moves to R. So there is no single strategy that achieves the optimal 
€ for every adversary. A similar argument shows that there is also no strategy 
achieving the smallest possible p for every adversary. In essence, optimizing e or 
p in every case would require the strategy to depend on the adversary. 


4.3 Improviser Construction: Details 


Our improvising strategy, as outlined in the previous section, is shown in Algo- 
rithm 1. We first compute a and 6, the (maximum) probabilities for generating 
elements of A and J \ A respectively. As in [8], we take a as large as possible 
given a < p, and determine 8 from the probability left over (modulo a couple 
corner cases). 
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Algorithm 1. the strategy ô 


1: a — min(p, 1/W(A)) (or 0 instead if W(A) = 0) 

2: B. — (1—aW(A))/(W(1) -W(A)) (or 0 instead if W(I) - W(A) = 0) 
3: m^ — W(A), m! — W(I) 

4: hà 

5: while the game is not over after h do 

6: if it is our turn after h then 

re mz, ml, — PARTITION(m^, m! , h) > returns values for each u € X 
8: for each u € X, put t, — am? + B(ml, — må) 

9: pick u € X with probability proportional to tu and append it to h 
10: m? — má, ml — ml, 
11: else 
12: the adversary picks u € X given the history h; append it to h 

return h 


+3 T 


+2 1 0 


OROL E E. 


=3 Sy 


Fig. 5. A run of Algorithm 1, labeling states with corresponding widths of I (left) and 
A (right). 


Next we initialize m^ and m/, our expectations for how many plays in A and 
I respectively are still possible to generate. Initially these are given by W(A) 
and W (I), but as we saw above it is possible for more plays to become available. 
The function PARTITION handles this, deciding which m^ (resp., m^) out of 
the available W(A|h) (W(I|h)) plays we will use. The behavior of PARTITION is 
defined by the following lemma; its proof (in Appendix B [10]) greedily takes the 
first m^ possible plays in A under some canonical order and the first m? — m4 
of the remaining plays in J. 


Lemma 4.2. If it is our turn after h € SS", and m^,m! € Z satisfy 0 < 
m^ « m! < W(I|h) and m^ < W(A|h), there are integer partitions »-— mA 
and Y csmi, of m^ and m! respectively such that 0 < m2 < ml, < W(I|hu) 
and mj < W(Alhu) for allu € X. These are computable in poly-time given 


oracles for W(I|-) and W(A|-). 
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Finally, we perform the random walk, moving from position h to hu with 
(unnormalized) probability tu, the weighted average described above. 


Running Example. With € = p = 1/2, as before W(A) = 1 and W(J) = 4 
so a = 1/2 and 8 = 1/6. On the first move, m^ and m? match W(A|h) and 
W(I|h), so all plays are used and PARTITION returns (W(A|hu), W (I|hu)) for 
each u € X. Looking up these values in Fig.5, we see (m2,ml.) = (0,2) and 
so t(2) = 26 = 1/3. Similarly t(+) = a = 1/2 and t(-) = 6 = 1/6. We 
choose an action according to these weights; suppose =, so that we update 
m^ — 0 and m/ — 2, and suppose the adversary responds with =. From Fig. 5, 
W(A| ==) = 1 and W(I| ==) = 3, whereas m^ = 0 and m! = 2. So PARTITION 
discards a play, say returning (m4,m/) = (0,1) for u € {+,=} and (0,0) for 
u € (—). Then t(+) = t(=) = 8 = 1/6 and t(—) = 0. So we pick + or = 
with equal probability, say +. If the adversary responds with +, we get the play 
==++4, shown in bold on Fig. 5. As desired, it satisfies the hard constraint. 


The next few lemmas establish that 6 is well-defined and in fact an impro- 
vising strategy, allowing us to prove Theorem 4.1. Throughout, we write m^(Ah) 
(resp., m? (h)) for the value of m^ (m4) at the start of the iteration for history 
h. We also write t(h) = am4(h) + B(m! (h) — m4(h)) (so t(hu) = t, when we 
pick u). 


Lemma 4.3. If W(I) > 1/p, then ô is a well-defined strategy and Pa, (I) = 1 
for every adversary T. 


Proof (sketch). An easy induction on ^ shows the conditions of Lemma 4.2 are 
always satisfied, and that t(h) is always positive since we never pick a u with 
ty = 0. So 32, tu = t(h) > 0 and 6 is well-defined. Furthermore, t(h) > 0 implies 
m! (h) > 0, so for any h € He, we have 15e; = W(I|h) > m? (h) > 0 and thus 
h € I. 


Lemma 4.4. If W(I) > 1/p, then P5, (A) > min(pW(A), 1) for every T. 


Proof (sketch). Because of the am4(h) term in the weights t(h), the probability 
of obtaining a play in A starting from h is at least am^ (A)/t(h) (as can be seen 
by induction on h in order of decreasing length). Then since m^(A) = W(A) 
and t(A) = 1 we have P5, (A) > aW(A) = min(oW (A), 1). 


Lemma 4.5. If W(I) > 1/p, then P5; (n) € p for every n € X" and T. 


Proof (sketch). If the adversary is deterministic, the weights we use for our 
random walk yield a distribution where each play m has probability either o or 
B (depending on whether m4(z) = 1 or 0). If the adversary assigns nonzero 
probability to multiple choices this only decreases the probability of individual 
plays. Finally, since W (I) > 1/p we have a, B < p. 


Proof (of Theorem 4.1). We use a similar argument to that of [8]. 
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(1)=(2) Suppose o is an improvising strategy, and fix any adversary 7. Then 
ple. n I| = Dren nr P Es MET Po,r(m) = P5.) = 1, so Hor N T| > 
1/p. Since 7 is arbitrary, this implies W(I) > 1/p. Since A C I, we also 
have plo N A| = Xren, na P 2 Linea Pos(7) = Po(A) 2 1— € so 
Ho, N A| > (1 — €)/p and thus W(A) > (1— €)/p. 

(2)2(3) By Lemmas 4.3 and 4.5, 6 is well-defined and satisfies the hard and 
randomness constraints. By Lemma4.4, P;;(A) > min(pW(A),1) > 1—e, 
so 6 also satisfies the soft constraint and thus is an improvising strategy. Its 
transition probabilities are rational, so it can be implemented by an expected 
finite-time probabilistic algorithm, which is then an improviser for C. 

(3) (1) Immediate. 


Proof (of Corollary 4.1). The inequalities in the statement are equivalent to 
those of Theorem 4.1 (2). By Lemma 4.4, we have P,- (A) > min(pW(A), 1). So 
the error probability is at most 1 — min(9W (A), 1) = €opt- 


5 A Generic Improviser 


We now use the construction of Sect. 4 to develop a generic improvisation scheme 
usable with any class of specifications SPEC supporting the following operations: 


Intersection: Given specs X and y, find Z such that L(Z) = L(X)n L(Y). 


Width Measurement: Given a specification ¥, a length n € N in unary, and 
a history h € XS”, compute W(X|h) where X = L(X) n X^. 


Efficient algorithms for these operations lead to efficient improvisation 
schemes: 


Theorem 5.1. If the operations on SPEC above take polynomial time (resp. 
space), then RCI(SPEC,SPEC) has a polynomial-time (space) improvisation 
scheme. 


Proof. Given an instance C = (H, S, n, €, p) in RCI (SPEC, SPEC), we first apply 
intersection to H and S to obtain A € SPEC such that L(A) X" = A. 
Since intersection takes polynomial time (space), A has size polynomial in |C]. 
Next we use width measurement to compute W(I) = W(L(H) n X"|A) and 
W(A) = W(L(A) n E”|A). If these violate the inequalities in Theorem 4.1, then 
C is not realizable and we return |. Otherwise C is realizable, and 6 above is 
an improvising strategy. Furthermore, we can construct an expected finite-time 
probabilistic algorithm implementing 6, using width measurement to instanti- 
ate the oracles needed by Lemma 4.2. Determining m^(h) and m? (h) takes O(n) 
invocations of PARTITION, each of which is poly-time relative to the width mea- 
surements. These take time (space) polynomial in |C|, since H and A have size 
polynomial in |C|. As m4,m! < ||", they have polynomial bitwidth and so 
the arithmetic required to compute f£, for each u € X takes polynomial time. 
Therefore the total expected runtime (space) of the improviser is polynomial. 
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Note that as a byproduct of testing the inequalities in Theorem4.1, our 
algorithm can compute the best possible error probability €opt given H, 5, and 
p (see Corollary 4.1). Alternatively, given e, we can compute the best possible p. 

We will see below how to efficiently compute widths for DFAs, so Theo- 
rem 5.1 yields a polynomial-time improvisation scheme. If we allow polynomial- 
space schemes, we can use a general technique for width measurement that only 
requires a very weak assumption on the specifications, namely testability in 
polynomial space: 


Theorem 5.2. RCI (PSA, PSA) has a polynomial-space improvisation scheme, 
where PSA is the class of polynomial-space decision algorithms. 


Proof (sketch). We apply Theorem 5.1, computing widths recursively using Lem- 
mas4.1, (3)-(5). As in the PSPACE QBF algorithm, the current path in the 
recursive tree and required auxiliary storage need only polynomial space. 


6 Reachability Games and DFAs 


Now we develop a polynomial-time improvisation scheme for RCI instances with 
DFA specifications. This also provides a scheme for reachability/safety games, 
whose winning conditions can be straightforwardly encoded as DFAs. 

Suppose D is a DFA with states V, accepting states T, and transition function 
ô: V x X — V. Our scheme is based on the fact that W(L(D)|h) depends only 
on the state of D reached on input h, allowing these widths to be computed by 
dynamic programming. Specifically, for all v € V and i € (0,...,n] we define: 


Luer = n 
C(v,i)= 4 minyes C(d(v,u),i +1) i<nAiodd 
Yuen C(ô(v,u),i +1) otherwise. 


Running Example. Figure 6 shows the values C(v, i) in rows from i = n down- 
ward. For example, i = 2 is our turn, so C(1,2) = C(0,3) + C(1,3) + C(2,3) = 
14-14-0 = 2, while i = 3 is the adversary’s turn, so C(—3,3) = min{C(—3,4)} = 
min{0} = 0. Note that the values in Fig. 6 agree with the widths W(I|h) shown 
in Fig. 5. 


Lemma 6.1. For any history h € XS", writing X = L(D) ND” we have 
W(X|h) = C(D(h),|h|), where D(h) is the state reached by running D on h. 


Proof. We prove this by induction on i = |h] in decreasing order. In the base case 
i = n, we have W(X|h) = lnex = lpgyer = C(D(h), n). Now take any history 
h € X5" with |h| = i < n. By hypothesis, for any u € X we have W(X|hu) = 
C(D(hu),i + 1). If it is our turn after h, then W(X|h) = doen W(X|hu) = 
ues C(D(hu),i + 1) = C(D(h),1) as desired. If instead it is the adversary's 
turn after h, then W(X|h) = minses W(X|hu) = minyex C(D(hu),i + 1) = 
C(D(h),i) again as desired. So by induction the hypothesis holds for any i. 
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Fig. 6. The hard specification DFA H in our running example, showing how W(J|h) 
is computed. 


Theorem 6.1. RCI (DFA, DFA) has a polynomial-time improvisation scheme. 


Proof. We implement Theorem 5.1. Intersection can be done with the standard 
product construction. For width measurement we compute the quantities C(v, i) 
by dynamic programming (from i = n down to i = 0) and apply Lemma 6.1. 


7 Temporal Logics and Other Specifications 


In this section we analyze the complexity of reactive control improvisation for 
specifications in the popular temporal logics LTL and LDL. We also look at NFA 
and CFG specifications, previously studied for non-reactive CI [8], to see how 
their complexities change in the reactive case. 

For LTL specifications, reactive control improvisation is PSPACE-hard 
because this is already true of ordinary reactive synthesis in a finite window 
(we suspect this has been observed but could not find a proof in the literature). 


Theorem 7.1. Finite-window reactive synthesis for LTL is PSPACE-hard. 


Proof (sketch). Given a QBF $ = AaVy...x, we can view assignments to its 
variables as traces over a single proposition. In polynomial time we can construct 
an LTL formula v whose models are the satisfying assignments of x. Then there 
is a winning strategy to generate a play satisfying w iff ¢ is true. 


Corollary 7.1. RCI (LTL, X*) and RCI (Z*, LTL) are PSPACE-Aard. 


'This is perhaps disappointing, but is an inevitable consequence of LTL subsum- 
ing Boolean formulas. On the other hand, our general polynomial-space scheme 
applies to LTL and its much more expressive generalization LDL: 


Theorem 7.2. RCI (LDL, LDL) has a polynomial-space improvisation scheme. 


Proof. This follows from Theorem 5.2, since satisfaction of an LDL formula by 
a finite word can be checked in polynomial time (e.g. by combining dynamic 
programming on subformulas with a regular expression parser). 
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Thus for temporal logics polynomial-time algorithms are unlikely, but adding 
randomization to reactive synthesis does not increase its complexity. 

The same is true for NFA and CFG specifications, where it is again PSPACE- 
hard to find even a single winning strategy: 


Theorem 7.3. Finite-window reactive synthesis for NFAs is PSPACE-hard. 


Proof (sketch). Reduce from QBF as in Theorem 7.1, constructing an NFA 
accepting the satisfying assignments of x (as done in [13]). 


Corollary 7.2. RCI (NFA, &*) and RCI (£*, NFA) are PSPACE-hard. 


Theorem 7.4. RCI (CFG, CFG) has a polynomial-space improvisation scheme. 


Proof. By Theorem 5.2, since CFG parsing can be done in polynomial time. 


Since NFAs can be converted to CFGs in polynomial time, this completes 
the picture for the kinds of CI specifications previously studied. In non-reactive 
CI, DFA specifications admit a polynomial-time improvisation scheme while for 
NFAs/CFGs the CI problem is #P-equivalent [8]. Adding reactivity, DFA spec- 
ifications remain polynomial-time while NFAs and CFGs move up to PSPACE. 


Table 1. Complexity of the reactive control improvisation problem for various types 
of hard and soft specifications H, S. Here PSPACE indicates that checking realizability 
is PSPACE-hard, and that there is a polynomial-space improvisation scheme. 


H\S || RSG | DFA | NFA | CFG | LTL | LDL 


RSG 
DFA 
NFA 
CFG 
LTL 
LDL 


poly-time 


PSPACE 


8 Conclusion 


In this paper we introduced reactive control improvisation as a framework for 
modeling reactive synthesis problems where random but controlled behavior is 
desired. RCI provides a natural way to tune the amount of randomness while 
ensuring that safety or other constraints remain satisfied. We showed that RCI 
problems can be efficiently solved in many cases occurring in practice, giving a 
polynomial-time improvisation scheme for reachability/safety or DFA specifica- 
tions. We also showed that RCI problems with specifications in LTL or LDL, pop- 
ularly used in planning, have the PSPACE-hardness typical of bounded games, 
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and gave a matching polynomial-space improvisation scheme. This scheme gener- 
alizes to any specification checkable in polynomial space, including NFAs, CFGs, 
and many more expressive formalisms. Table 1 summarizes these results. 

These results show that, at a high level, finding a maximally-randomized 
strategy using RCI is no harder than finding any winning strategy at all: for 
specifications yielding games solvable in polynomial time (respectively, space), 
we gave polynomial-time (space) improvisation schemes. We therefore hope that 
in applications where ordinary reactive synthesis has proved tractable, our notion 
of randomized reactive synthesis will also. In particular, we expect our DFA 
scheme to be quite practical, and are experimenting with applications in robotic 
planning. On the other hand, our scheme for temporal logic specifications seems 
unlikely to be useful in practice without further refinement. An interesting direc- 
tion for future work would be to see if modern solvers for quantified Boolean 
formulas (QBF) could be leveraged or extended to solve these RCI problems. 
This could be useful even for DFA specifications, as conjoining many simple 
properties can lead to exponentially-large automata. Symbolic methods based 
on constraint solvers would avoid such blow-up. 

We are also interested in extending the RCI problem definition to unbounded 
or infinite words, as typically used in reactive synthesis. These extensions, as 
well as that to continuous signals, would be useful in robotic planning, cyber- 
physical system testing, and other applications. However, it is unclear how best 
to adapt our randomness constraint to settings where the improviser can gen- 
erate infinitely many words. In such settings the improviser could assign arbi- 
trarily small or even zero probability to every word, rendering the randomness 
constraint trivial. Even in the bounded case, RCI extensions with more complex 
randomness constraints than a simple upper bound on individual word probabil- 
ities would be worthy of study. One possibility would be to more directly control 
diversity and/or unpredictability by requiring the distribution of the improviser’s 
output to be close to uniform after transformation by a given function. 
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Abstract. Proof by coupling is a classical technique for proving prop- 
erties about pairs of randomized algorithms by carefully relating (or 
coupling) two probabilistic executions. In this paper, we show how to 
automatically construct such proofs for probabilistic programs. First, we 
present f-coupled postconditions, an abstraction describing two corre- 
lated program executions. Second, we show how properties of f-coupled 
postconditions can imply various probabilistic properties of the original 
programs. Third, we demonstrate how to reduce the proof-search prob- 
lem to a purely logical synthesis problem of the form df. VX. p, making 
probabilistic reasoning unnecessary. We develop a prototype implemen- 
tation to automatically build coupling proofs for probabilistic properties, 
including uniformity and independence of program expressions. 


1 Introduction 


In this paper, we aim to automatically synthesize coupling proofs for probabilis- 
tic programs and properties. Originally designed for proving properties compar- 
ing two probabilistic programs—so-called relational properties—a coupling proof 
describes how to correlate two executions of the given programs, simulating both 
programs with a single probabilistic program. By reasoning about this combined, 
coupled process, we can often give simpler proofs of probabilistic properties for 
the original pair of programs. 

A number of recent works have leveraged this idea to verify relational prop- 
erties of randomized algorithms, including differential privacy [8, 10, 12], security 
of cryptographic protocols [9], convergence of Markov chains [11], robustness of 
machine learning algorithms [7], and more. Recently, Barthe et al. [6] showed 
how to reduce certain non-relational properties—which describe a single prob- 
abilistic program—to relational properties of two programs, by duplicating the 
original program or by sequentially composing it with itself. 

While coupling proofs can simplify reasoning about probabilistic properties, 
they are not so easy to use; most existing proofs are carried out manually in 
relational program logics using interactive theorem provers. In a nutshell, the 
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main challenge in a coupling proof is to select a correlation for each pair of cor- 
responding sampling instructions, aiming to induce a particular relation between 
the outputs of the coupled process; this relation then implies the desired rela- 
tional property. Just like finding inductive invariants in proofs for deterministic 
programs, picking suitable couplings in proofs can require substantial ingenuity. 

To ease this task, we recently showed how to cast the search for coupling 
proofs as a program synthesis problem [1], giving a way to automatically find 
sophisticated proofs of differential privacy previously beyond the reach of auto- 
mated verification. In the present paper, we build on this idea and present a 
general technique for constructing coupling proofs, targeting uniformity and 
probabilistic independence properties. Both are fundamental properties in the 
analysis of randomized algorithms, either in their own right or as prerequisites 
to proving more sophisticated guarantees; uniformity states that a randomized 
expression takes on all values in a finite range with equal probability, while 
probabilistic independence states that two probabilistic expressions are some- 
how uncorrelated—learning the value of one reveals no additional information 
about the value of the other. 

Our techniques are inspired by the automated proofs of differential privacy we 
considered previously [1], but the present setting raises new technical challenges. 


Non-lockstep execution. To prove differential privacy, the behavior of a 
single program is compared on two related inputs. To take advantage of the 
identical program structure, previous work restricted attention to synchroniz- 
ing proofs, where the two executions can be analyzed assuming they follow 
the same control flow path. In contrast, coupling proofs for uniformity and 
independence often require relating two programs with different shapes, pos- 
sibly following completely different control flows [6]. 

To overcome this challenge, we take a different approach. Instead of incre- 
mentally finding couplings for corresponding pairs of sampling instructions— 
requiring the executions to be tightly synchronized—we first lift all sampling 
instructions to the front of the program and pick a coupling once and for all. 
The remaining execution of both programs can then be encoded separately, 
with no need for lockstep synchronization (at least for loop-free programs— 
looping programs require a more careful treatment). 

Richer space of couplings. The heart of a coupling proof is selecting— 
among multiple possible options—a particular correlation for each pair of 
random sampling instructions. Random sampling in differentially private pro- 
grams typically use highly domain-specific distributions, like the Laplace dis- 
tribution, which support a small number of useful couplings. Our prior work 
leveraged this feature to encode a collection of primitive couplings into the 
synthesis system. However, this is no longer possible when programs sample 
from distributions supporting richer couplings, like the uniform distribution. 
Since our approach coalesces all sampling instructions at the beginning of 
the program (more generally, at the head of the loop), we also need to find 
couplings for products of distributions. 
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We address this problem in two ways. First, we allow couplings of two 
sampling instructions to be specified by an injective function f from one 
range to another. Then, we impose requirements—encoded as standard logi- 
cal constraints—to ensure that f indeed represents a coupling; we call such 
couplings f-couplings. 

More general class of properties. Finally, we consider a broad class of 
properties rather than just differential privacy. While we focus on uniformity 
and independence for concreteness, our approach can establish general equal- 
ities between products of probabilities, i.e., probabilistic properties of the 
form 


| [ Prie: € Zi = [ [ Prle; € Ej), 
i=l j=l 


where e; and €; are program expressions in the first and second programs 
respectively, and E; and E; are predicates. As an example, we automatically 
establish a key step in the proof of Bertrand's Ballot theorem [20]. 


Paper Outline. After overviewing our technique on a motivating example 
(Sect. 2), we detail our main contributions. 


Proof technique: We introduce f-coupled postconditions, a form of postcon- 
dition for two probabilistic programs where random sampling instructions in 
the two programs are correlated by a function f. Using f-coupled postcon- 
ditions, we present proof rules for establishing uniformity and independence 
of program variables, fundamental properties in the analysis of randomized 
algorithms (Sect. 3). 

Reduction to constraint-based synthesis: We demonstrate how to auto- 
matically find coupling proofs by transforming our proof rules into logical 
constraints of the form df.V X.«q-——-a synthesis problem. A satisfiable con- 
straint shows the existence of a function f—essentially, a compact encoding 
of a coupling proof—implying the target property (Sect. 4). 

Extension to looping programs: We extend our technique to reason about 
loops, by requiring synchronization at the loop head and finding a coupled 
invariant (Sect. 5). 

Implementation and evaluation: We implement our technique and evalu- 
ate it on several case studies, automatically constructing coupling proofs for 
interesting properties of a variety of algorithms (Sect. 6). 


We conclude by comparing our technique with related approaches (Sect. 7). 


2 


Overview and Illustration 


2.1 Introducing f-Couplings 


A Simple Example. We begin by illustrating f-couplings over two identi- 
cal Bernoulli distributions, denoted by the following probability mass functions: 
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alx) = u2(x) = 0.5 for all x € B (where B = (true, false}). In other words, the 
distribution u; returns true with probability 0.5, and false with probability 0.5. 
An f-coupling for [1,2 is a function f : IB — B from the domain of the 
first distribution (IB) to the domain of the second (also IB); f should be injective 
and satisfy the monotonicity property: ui(x) < pa(f(x)) for all x € B. In other 
words, f relates each element x € B with an element f(x) that has an equal or 
larger probability in u2. For example, consider the function f. defined as 


f-(x) = ^w. 


This function relates true in uı with false in 2, and vice versa. Observe that 
alx) € ua(f-(x)) for all x € B, satisfying the definition of an f--coupling. We 
write fy «~= ua when there is an f.-coupling for jj and po. 

Using f-Couplings. An f-coupling can imply useful properties about the dis- 
tributions uj and py. For example, suppose we want to prove that p1(true) = 
Lio (false). The fact that there is an f.-coupling of pı and 2 immediately implies 
the equality: by the monotonicity property, 


2(f.(true)) = p2(false) 
a( f(false)) = ual true) 


and therefore (true) = j2(false). More generally, it suffices to find an f- 
coupling of js; and u2 such that 


[(z, f(x)) | v € B} C {(21, 22) | 21 = true Z9 = false}, 
————— 


pa (true) 


<u 
pı(false) < p 


N 
Wy 


where VU, is induced by f; in particular, the f_-coupling satisfies this property. 


2.2 Simulating a Fair Coin 


Now, let's use f-couplings to prove more interesting fun fairCoin(p € (0, 1)) 


properties. Consider the program fairCoin in Fig. 1; x + false 

the program simulates a fair coin by flipping a pos- y + false 

sibly biased coin that returns true with probability while x = y do 
p € (0, 1), where p is a program parameter. Our goal x ~ bern(p) 
is to prove that for any p, the output of the program y ~ bern(p) 


is a uniform distribution—it simulates a fair coin. return x 


We consider two separate copies of fairCoin generat- 
ing distributions pı and u2 over the returned value 
x for the same bias p, and we construct a coupling 
showing u4(true) = uo(false), that is, heads and tails have equal probability. 


Fig.1. Simulating a fair 
coin using an unfair one 


Constructing f-Couplings. At first glance, it is unclear how to construct an 
f-coupling; unlike the distributions in our simple example, we do not have a 
concrete description of ju; and u2 as uniform distributions (indeed, this is what 
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we are trying to establish). The key insight is that we do not need to construct 
our coupling in one shot. Instead, we can specify a coupling for the concrete, 
primitive sampling instructions in the body of the loop—which we know sample 
from bern(p)—and then extend to a f-coupling for the whole loop and 41, u2. 

For each copy of fairCoin, we coalesce the two sampling statements inside the 
loop into a single sampling statement from the product distribution: 


x,y ~ bern(p) x bern(p) 


We have two such joint distributions bern(p) x bern(p) to couple, one from each 
copy of fairCoin. We use the following function fswap : B? > B?: 


swap (a, y) = (y, x) 


which exchanges the values of x and y. Since this is an injective function satis- 
fying the monotonicity property 


(bern(p) x bern(p))(z, y) < (bern(p) x bern(p))(fswap (2, v) 


for all (x,y) € IB x B and p € (0, 1), we have an fswap-coupling for the two copies 
of bern(p) x bern(p). 


Analyzing the Loop. To extend a fpoay-coupling on loop bodies to the entire 
loop, it suffices to check a synchronization condition: the coupling from foody 
must ensure that the loop guards are equal so the two executions synchronize at 
the loop head. This holds in our case: every time the first program executes the 
statement x,y ~ bern(p) x bern(p), we can think of x, y as non-deterministically 
set to some values (a, b), and the corresponding variables in the second program 
as set to fswap(a, b) = (b, a). The loop guards in the two programs are equivalent 
under this choice, since a — 6 is equivalent to b — a, hence we can analyze the 
loops in lockstep. In general, couplings enable us to relate samples from a pair 
of probabilistic assignments as if they were selected non-deterministically, often 
avoiding quantitative reasoning about probabilities. 

Our constructed coupling for the loop guarantees that (i) both programs exit 
the loop at the same time, and (ii) when the two programs exit the loop, x takes 
opposite values in the two programs. In other words, there is an fj,,5-coupling 
of uj and u2 for some function fj,55 such that 


Ve E {(21, 22) | 4 = true z2 = false}, (1) 


implying ji (true) = p2(false). Since both distributions are output distributions 
of fairCoin—hence pı = pa—we conclude that fairCoin simulates a fair coin. 

Note that our approach does not need to construct fioop concretely—this 
function may be highly complex. Instead, we only need to show that Wy,,,, (or 
some over-approximation) lies inside the target relation in Formula 1. 


Achieving Automation. Observe that once we have fixed an fpoay-coupling for 
the sampling instructions inside the loop body, checking that the floop-coupling 
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satisfies the conditions for uniformity (Formula 1) is essentially a program veri- 
fication problem. Therefore, we can cast the problem of constructing a coupling 
proof as a logical problem of the form 4f.VX.y, where f is the f-coupling we 
need to discover and YX. is a constraint ensuring that (i) f indeed repre- 
sents an f-coupling, and (4i) the f-coupling implies uniformity. Thus, we can 
use established synthesis-verification techniques to solve the resulting constraints 
(see, e.g., [2,13,27]). 


3 A Proof Rule for Coupling Proofs 


In this section, we develop a technique for constructing couplings and formalize 
proof rules for establishing uniformity and independence properties over program 
variables. We begin with background on probability distributions and couplings. 


3.1 Distributions and Couplings 


Distributions. A function p : B — [0,1] defines a distribution over a countable 
set B if yep u(b) = 1. We will often write u(A) for a subset A C B to mean 
Srca L(x). We write dist(B) for the set of all distributions over B. 

We will need a few standard constructions on distributions. First, the support 
of a distribution j| is defined as supp(u) = {b € B | u(b) > 0). Second, for a 
distribution on pairs u € dist(B, x B2), the first and second marginals of u, 
denoted 71(js) and 72() respectively, are distributions over Bı and Bo: 


m(u)(b) = $5 po(br, b2) m2(u)(b2) = X- p(br,b2). 


b2€ B5 bi€ Bi 


Couplings. Let V C Bj x Bə bea binary relation. A V-coupling for distributions 
Lı and u2 over Bı and B» is a distribution p € dist(By x Bg) with (i) m(p) = pı 
and 72(j1) = u2; and (ii) supp(u) € V. We write p ^" uo when there exists a 
V-coupling between jz; and p2. 

An important fact is that an injective function f : Bı — B» where qu (b) < 
u2(f(b)) for all b € Bı induces a coupling between ju, and jig; this follows 
from a general theorem by Strassen [28], see also [23]. We write p «^ po for 
pi ^T! u2, where V; = {(bi, f(b1)) | b» € Bi}. The existence of a coupling 
can imply various useful properties about the two distributions. The following 
general fact will be the most important for our purposes—couplings can prove 
equalities between probabilities. 


Proposition 1. Let E; C Bı and Ey C Bs» be two events, and let W- £ 
{(b1,b2) | b1 € Ey <=> by € Eb}. If ui "= u2, then p (E1) = ua (Es). 
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3.2 Program Model 


Our program model uses an imperative language with probabilistic assignments, 
where we can draw a random value from primitive distributions. We consider the 
easier case of loop-free programs first; we consider looping programs in Sect. 5. 


Syntax. A (loop-free) program P is defined using the following grammar: 


P := V — ezp (assignment 
| V ~ dezp (probabilistic assignment 
| if bexp then P else P (conditional 


) 
) 
) 
| P;P (sequential composition) 


where V is the set of variables that can appear in P, ezp is an expression over 
V, and bezp is a Boolean expression over V. A probabilistic assignment samples 
from a probability distribution defined by expression dezp; for instance, dexp 
might be bern(p), the Bernoulli distribution with probability p of returning true. 
We use V? C V to denote the set of input program variables, which are never 
assigned to. All other variables are assumed to be defined before use. 

We make a few simplifying assumptions. First, distribution expressions only 
mention input variables V7, e.g., in the example above, bern(p), we have p € V7. 
Also, all programs are in static single assignment (SSA) form, where each variable 
is assigned to only once and are well-typed. These assumptions are relatively 
minor; they can can be verified using existing tools, or lifted entirely at the cost 
of slightly more complexity in our encoding. 


Semantics. A state s of a program P is a valuation of all of its variables, 
represented as a map from variables to values, e.g., s(x) is the value of « € V 
in s. We extend this mapping to expressions: s(exp) is the valuation of exp in s, 
and s(dezxp) is the probability distribution defined by dezp in s. 

We use S to denote the set of all possible program states. As is standard 
[24], we can give a semantics of P as a function [P] : $ — dist(S) from states to 
distributions over states. For an output distribution u = [P](s), we will abuse 
notation and write, e.g., y(x = y) to denote the probability of the event that 
the program returns a state s where s(x = y) = true. 


Self- Composition. We will sometimes need to simulate two separate executions 
of a program with a single probabilistic program. Given a program P, we use 
Pj; to denote a program identical to P but with all variables tagged with the 
subscript 7. We can then define the self-composition: given a program P, the 
program Pi; P> first executes Pi, and then executes the (separate) copy P5. 


3.3 Coupled Postconditions 


We are now ready to present the f-coupled postcondition, an operator for approx- 
imating the outputs of two coupled programs. 
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Strongest Postcondition. We begin by defining a standard strongest post- 
condition operator over single programs, treating probabilistic assignments as 
no-ops. Given a set of states Q C S, we define post as follows: 


post(v — exp, Q) = {slv + s(exp)] | s € Q} 
post(v ~ dezp, Q) =Q 
post(if berp then P else P’, Q)={s'|s€Q,s' € post(P, s), s(bezp) = true} 
U (s' | s € Q, s' € post(P', s), s(bexp) = false} 
post(P; P', Q) = post(P’, post(P, Q)) 


where s[v +> c] is state s with variable v mapped to the value c. 


f-Coupled Postcondition. We rewrite programs so that all probabilistic 
assignments are combined into a single probabilistic assignment to a vector of 
variables appearing at the beginning of the program, i.e., an assignment of the 
form v ~ dezp in P and v’ ~ dezp! in P', where v, v' are vectors of variables. For 
instance, we can combine x ~ bern(0.5); y ^ bern(0.5) into the single statement 
x,y ~ bern(0.5) x bern(0.5). 

Let B,B' be the domains of v and v’, f : B — B' be a function, and 
Q C S x S' be a set of pairs of input states, where S and S” are the states of P 
and P’, respectively. We define the f-coupled postcondition operator cpost as 


cpost(P, P", Q, f) = {(post(P, s), post(P", s’)) | (s,s) € Q'T 
where Q' = {(s[v > b], s'[v' = f(b)]) | (s.s) € Q,b € Bj, 
assuming that  V(s,s') € Q. s(dezp) ~” s'(dezp’). (2) 


The intuition is that the values drawn from sampling assignments in both 
programs are coupled using the function f. Note that this operation non- 
deterministically assigns v from P with some values b, and v' with f(b). Then, 
the operation simulates the executions of the two programs. Formula2 states 
that there is an f-coupling for every instantiation of the two distributions used 
in probabilistic assignments in both programs. 


Example 1. Consider the simple program P defined as x ~ bern(0.5); £ = 7a 
and let f.(x) — ^x. Then, cpost(P, P,Q, f.) is {(s, s") | s(x) = ^s'(z)]. 


'The main soundness theorem shows there is a probabilistic coupling of the 
output distributions with support contained in the coupled postcondition (we 
defer all proofs to the full version of this paper). 


Theorem 1. Let programs P and P' be of the form v ~ dexp; Pp and v' ~ 
dexp'; Ph, for deterministic programs Pp, Ph. Given a function f : B > B' 
satisfying Formula 2, for every (s,s') € S x S' we have [P](s) «^ [P'](s^), 
where V = cpost(P, P', (s, s"), f). 


3.4 Proof Rules for Uniformity and Independence 


We are now ready to demonstrate how to establish uniformity and independence 
of program variables using f-coupled postconditions. We will continue to assume 
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that random sampling commands have been lifted to the front of each program, 
and that f satisfies Formula 2. 


Uniformity. Consider a program P and a variable v* € V of finite, non-empty 
domain B. Let u = [P](s) for some state s € S. We say that variable v* is 
uniformly distributed in p if u(v* = b) = TI for every b € B. 

'The following theorem connects uniformity with f-coupled postconditions. 


Theorem 2 (Uniformity). Consider a program P with v ~ dexp as its first 
statement and a designated return variable v* € V with domain B. Let Q = 
{(s,s) | s € S} be the input relation. If we have 


f.cpost(P, P,Q, f) C ((s,s") e S x S| s(u*) =b 4 s'(v*) =b} 


for all b,b’ € B, then for any input s € S the final value of v* is uniformly 
distributed over B in [P](s). 


'The intuition is that in the two f-coupled copies of P, the first v* is equal to 
b exactly when the second v* is equal to b’. Hence, the probability of returning 
b in the first copy and 0’ in the second copy are the same. Repeating for every 
pair of values b, b’, we conclude that v* is uniformly distributed. 


Example 2. Recall Example 1 and let b = true and b’ = false. We have 
cpost(P, P,Q, f.) C ((s,s') € S x S| s(x) =b s(x) =0'}. 


This is sufficient to prove uniformity (the case with b = b’ is trivial). 


Independence. We now present a proof rule for independence. Consider a pro- 
gram P and two variables v*,w* € V with domains B and B’, respectively. Let 
u = [P](s) for some state s € S. We say that v*,w* are probabilistically inde- 
pendent in p if u(v* = b ^w* = b) = u(v* = b) - u(w* = 0’) for every b € B and 
beP. 

The following theorem connects independence with f-coupled postconditions. 
We will self-compose two tagged copies of P, called P, and P3. 


Theorem 3 (Independence). Assume a program P and define the relation 


Q = {(s,51 ® s2) | s E€ S, si € Si,s(v) = s;(vj), for all v € VŽ}, 


where © takes the union of two maps with disjoint domains. Fix some w*,v* € V 
with domains B,B', and assume that for all b € B, b € B', there exists a 
function f such that cpost(P, (Pi; P3), Q, f) is contained in 


(5,51 & s5) | s'(v*) 2 b^ s'Qw*) =b <=> si(vi) = bA s3(w3) =U}. 
Then, w*,v* are independently distributed in [P](s) for all inputs s € S. 


'The idea is that under the coupling, the probability of P returning v* — 
bAw* = b' is the same as the probability of P; returning v* = b and P returning 
w* = b', for all values b,b’. Since P, and P? are two independent executions of 
P by construction, this establishes independence of v* and w*. 
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4 Constraint-Based Formulation of Proof Rules 


In Sect.3, we formalized the problem of constructing a coupling proof using f- 
coupled postconditions. We now automatically find such proofs by posing the 
problem as a constraint, where a solution gives a function f establishing our 
desired property. 


4.1 Generating Logical and Probabilistic Constraints 


Logical Encoding. We first encode program executions as formulas in first- 
order logic, using the following encoding function: 


enc(v — exp) = 
enc(v ~ dezp) = tru 
enc(if bezp then P else P’)+ as => enc(P)) ^ (^bezp => enc(P’)) 
enc(P; P^) £ enc(P) ^ enc(P^) 


We assume a direct correspondence between expressions in our language and 
the first-order theory used for our encoding, e.g., linear arithmetic. Note that 
the encoding disregards probabilistic assignments, encoding them as true; this 
mimics the semantics of our strongest postcondition operator post. Probabilistic 
assignments will be handled via a separate encoding of f-couplings. 

As expected, enc reflects the strongest postcondition post. 


Lemma 1. Let P be a program and let p be any assignment of the variables. An 
assignment p' agreeing with p on all input variables V! satisfies the constraint 
enc( P)[p' /V] precisely when post(P, {p}) = {p'}, treating p, p' as program states. 


Uniformity Constraints. We can encode the conditions in Theorem 2 for show- 
ing uniformity as a logical constraint. For a program P and a copy P4, with first 
statements v ~ dezp and v; ~ dexp,, we define the constraints: 


Va, a! .Mf. VV, V4. 
(V! = VI ^vi = f(v) ^ enc(P) ^ enc(P,)) (3) 
(v* =a D =a") 
vi= Vi => dexp wf dexp, (4) 


Note that this is a second-order formula, as it quantifies over the uninterpreted 
function f. The left side of the implication in Formula 3 encodes an f-coupled 
execution of P and P, starting from equal initial states. The right side of this 
implication encodes the conditions for uniformity, as in Theorem 2. 

Formula 4 ensures that there is an f-coupling between dezp and dexp, for any 
initial state; recall that dexp may mention input variables V7. The constraint 
dexp ^^ dexp, is not a standard logical constraint—intuitively, it is satisfied if 
dexp ^^ dexp, holds for some interpretation of f, dezp, and dezp,. 


Constraint-Based Synthesis of Coupling Proofs 337 


Example 3. The constraint 


f. Vp, p.p = p! => bern(p) oF bern(p’) 


holds by setting f to the identity function id, since for any p = p’ we have an 
f-coupling bern(p) «4 bern(p’). 


Example 4. Consider the program x ~ bern(0.5); y = ~x. The constraints for 
uniformity of y are 


Va, a. 3f. VV, Vi.(x1 = f(z) ^y = ox ^yi = 721) (y—a yi =a’) 
bern(0.5) «^ bern(0.5). 


Since there are no input variables, V! = VŽ is equivalent to true. 


Theorem 4 (Uniformity constraints). Fix a program P and variable v* € 
V. Let p be the uniformity constraints in Formulas 3 and 4. If p is valid, then 
v* is uniformly distributed in [P](s) for all s € S. 


Independence Constraints. Similarly, we can characterize independence con- 
straints using the conditions in Theorem3. After transforming the program 
P,; P) to start with the single probabilistic assignment statement v1, 2 ~ dezp, v, 
combining probabilistic assignments in Pj and P5, we define the constraints: 


Va, a’ Af. VV, Vi, Vo. 
(V! = WE = VI ^ vis = f(v) ^ enc(P) ^ enc( P; P2)) (5) 
=> (v* =a a =d 4> vý =a^w3 ma) 
V! = VI = V} = derp e dexp, » (6) 


Theorem 5 (Independence constraints). Fix a program P and two vari- 
ables v*,w* € V. Let p be the independence constraints from Formulas 5 and 6. 
If p is valid, then v*,w* are independent in |P](s) for all s € S. 


4.2 Constraint Transformation 


To solve our constraints, we transform our constraints into the form 3f. VX. c, 
where y is a first-order formula. Such formulas can be viewed as synthesis prob- 
lems, and are often solvable automatically using standard techniques. 

We perform our transformation in two steps. First, we transform our con- 
straint into the form 3f. V X. pp, where yp still contains the coupling constraint. 
'Then, we replace the coupling constraint with a first-order formula by logically 
encoding primitive distributions as uninterpreted functions. 
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Quantifier Reordering. Our constraints are of the form Va, a’. 3f. V X. q. Intu- 
itively, this means that for every possible value of a,a’, we want one function f 
satisfying VX. p. We can pull the existential quantifier df to the outermost level 
by extending the function with additional parameters for a,a’, thus defining a 
different function for every interpretation of a,a’. For the uniformity constraints 
this transformation yields the following formulas: 


3g. Va, a’. VV, V4. 
(V! = VI ^ v; = g(a,a', v) ^ enc(P) ^ enc(P,)) (7) 
(v* =a vf =a’) 
Vi = VI = dezp eg.) dexp, (8) 


where g(a, a’, —) is the function after partially applying g. 


Transforming Coupling Constraints. Our next step is to eliminate coupling 
constraints. To do so, we use the definition of f-coupling, which states that 
pa 6^ mo if (i) f is injective and (ài) Yx. p(x) € paf (x)). The first constraint 
(injectivity) is straightforward. For the second point (monotonicity), we can 
encode distribution expressions—which represent functions to reals—as uninter- 
preted functions, which we then further constrain. For instance, the coupling 
constraint bern(p) e bern(p’) can be encoded as 


Vz,y.r zy f(x) # fly) (injectivity) 
Va. h(x) < h' (f(x)) (monotonicity) 
Vz.ite(z = true, h(x) = p, h(x) = 1— p) (bern(p) encoding) 


Vas tele = true, h'(x) = p', (s) = 1—p) (bern(p’) encoding) 


where h, h’ : B — R2° are uninterpreted functions representing the probability 
mass functions of bern(p) and bern(p’); note that the third constraint encodes 
the distribution bern(p), which returns true with probability p and false with 
probability 1 — p, and the fourth constraint encodes bern(p’). 

Note that if we cannot encode the definition of the distribution in our first- 
order theory (e.g., if it requires non-linear constraints), or if we do not have a 
concrete description of the distribution, we can simply elide the last two con- 
straints and under-constrain h and h’. In Sect. 6 we use this feature to prove 
properties of a program encoding a Bayesian network, where the primitive dis- 
tributions are unknown program parameters. 


Theorem 6 (Transformation soundness). Let o be the constraints generated 
for some program P. Let vy’ be the result of applying the above transformations 
to p. If vy’ is valid, then ọ is valid. 


Constraint Solving. After performing these transformations, we finally arrive 
at constraints of the form 3g. Va, a’. VV. p, where q is a first-order formula. These 
exactly match constraint-based program synthesis problems. In Sect.6, we use 
SMT solvers and enumerative synthesis to handle these constraints. 
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5 Dealing with Loops 


So far, we have only considered loop-free programs. In this section, we our app- 
roach to programs with loops. 


f-Coupled Postconditions and Loops. We consider programs of the form 


while bezp P^ 


where P? is a loop-free program that begins with the statement v ~ dezp; our 
technique can also be extended to handle nested loops. We assume all programs 
terminate with probability 1 for any initial state; there are numerous systems for 
verifying this basic property automatically (see, e.g., [15-17]). To extend our f- 
coupled postconditions, we let cpost(P, P', Q, f) be the smallest set I satisfying: 


QCI (initiation) 
cpost(P^, P”, Ten, fcri (consecution) 
I C {s(bexp) = s'(bexp') | se S,s' € S') (synchronization) 


where Ten 5 ((s, s") € I | s(bexp) = true}. 

Intuitively, the set / is the least inductive invariant for the two coupled 
programs running with synchronized loops. Theorem 1, which establishes that 
f-coupled postconditions result in couplings over output distributions, naturally 
extends to a setting with loops. 


Constraint Generation. To prove uniformity, we generate constraints much 
like the loop-free case except that we capture the invariant /, modeled as a 
relation over the variables of both programs, using a Constrained Horn-Clause 
(CHC) encoding. As is standard, we use V’, Vi to denote primed copies of program 
variables denoting their value after executing the body, and we assume that 
enc( P^) encodes a loop-free program as a transition relation from states over V 
to states over V". 


Va, a .3f, I. VV, Vy, V, Vi. 


vi=vi = I(VWV) (initiation) 
I(V, Vi) ^ bezp ^ v, = f(v’) ^ enc(P^) ^ enc(P?) => I(V', Vj) (consecution) 
I(V, Vi) => bezp = bexp, (synchronization) 
I(V,Vi) => dexp eT dexp, (coupling) 
I(V, Vi) A abexp (v* =a vi =a’) (uniformity) 


The first three constraints encode the definition of cpost; the last two ensure 
that f constructs a coupling and that the invariant implies the uniformity con- 
dition when the loop terminates. Using the technique presented in Sect. 4.2, we 
can transform these constraints into the form 3f, I. VX.«q. That is, in addition 
to discovering the function f, we need to discover the invariant I. 

Proving independence in looping programs poses additional challenges, as 
directly applying the self-composition construction from Sect. 3 requires relating 
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a single loop with two loops. When the number of loop iterations is deterministic, 
however, we may simulate two sequentially composed loops with a single loop 


that interleaves the iterations (known as synchronized or cross product [4,29]) 
so that we reduce the synthesis problem to finding a coupling for two loops. 


6 Implementation and Evaluation 


We now discuss our implementation and five case studies used for evaluation. 


fun fairCoin(p € (0, 1)) 
x + false 
y + false 
while x — y do 
x ~ bern(p) 
y ~ bern(p) 
return xr 


fun fairDie 

x «— false 

y + false 

z+ false 

while x — y — z do 
x ~ bern(0.5) 
y ~ bern(0.5) 
z ~ bern(0.5) 


fun noisySum(n, p € (0, 1)) 
sum + 0 
fori=1,...,n do 

noise[i] ~ bern(p) 
sum < sum + noise[i] 
return sum 


fun bayes(u, p’, p^) 
z~u 
1 
yy 
ow pl 
w + f(x,y) 
w' €- gly, z) 
return (w, w’) 


fun ballot(n) 
tie + false 
ZA —0 
LB e a 0 
fori=1,...,n do 
T ~ bern(0.5) 
if r = 0 then 
ZA €—TA-1 
else 
rpg «c rpg-4l 
if i = 1 then 
first cr 
if r4 = xp then 
tie — true 
return (first, tie) 


return (x,y, z) 


Fig. 2. Case study programs 


Implementation. To solve formulas of the form 4f.VX.y, we implemented a 
simple solver using a guess-and-check loop: We iterate through various interpre- 
tations of f, insert them into the formula, and check whether the resulting for- 
mula is valid. In the simplest case, we are searching for a function f from n-tuples 
to n-tuples. For instance, in Sect. 2.2, we discovered the function f(x,y) = (y, x). 
Our implementation is parameterized by a grammar defining an infinite set of 
interpretations of f, which involves permuting the arguments (as above), con- 
ditionals, and other basic operations (e.g., negation for Boolean variables). For 
checking validity of VX. y given f, we use the Z3 SMT solver [19] for loop-free 
programs. For loops, we use an existing constrained-Horn-clause solver based on 
the MathSAT sMr solver [18]. 


Benchmarks and Results. As a set of case studies for our approach, we use 
5 different programs collected from the literature and presented in Fig.2. For 
these programs, we prove uniformity, (conditional) independence properties, and 
other probabilistic equalities. For instance, we use our implementation to prove 
a main lemma for the Ballot theorem [20], encoded as the program ballot. 
Figure3 shows the time and number of loop iterations required by our imple- 
mentation to discover a coupling proof. The small number of iterations and time 
needed demonstrates the simplicity of the discovered proofs. For instance, the 
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ballot theorem was proved in 3 s and only 4 iterations, while the fairCoin example 
(illustrated in Sect. 2.2) required only two iterations and 1.4 s. In all cases, the 
size of the synthesize function f in terms of depth of its AST is no more than 4. 
We describe these programs and properties in a bit more detail. 


Case Studies: Uniformity (fairCoin, fairDie). The first two programs pro- 
duce uniformly random values. Our approach synthesizes a coupling proof cer- 
tifying uniformity for both of these programs. The first program fairCoin, which 
we saw in Sect. 2.2, produces a fair coin flip given access to biased coin flips by 
repeatedly flipping two coins while they are equal, and returning the result of 
the first coin as soon as the flips differ. Note that the bias of the coin flip is a 
program parameter, and not fixed statically. The synthesized coupling swaps the 
result of the two samples, mapping the values of (x, y) to (y, x). 

The second program fairDie gives a different con- 
struction for simulating a roll of a fair die given fair 
coin flips. Three fair coins are repeatedly flipped as 
long as they are all equal; the returned triple is the 


Program Iters. Time(s) 


fairCoin 2 1.4 


binary representation of a number in {1,...,6}, the acts i us 

result of the simulated roll. The synthesized cou- uA MM i 
M NS ; bayes 5 0.4 

pling is a bijection on triples of booleans B x B x B; ballot 4 3.0 


fixing any two possible output triples (51,05, 53) and 
(b1, 65,05) of distinct booleans, the coupling maps 
(b1, 05,03) + (b1, 55,05) and vice versa, leaving all 
other triples unchanged. 


Fig. 3. Statistics 


Case Studies: Independence (noisySum, bayes). In the next two programs, 
our approach synthesizes coupling proofs of independence and conditional inde- 
pendence of program variables in the output distribution. The first program, 
noisySum, is a stylized program inspired from privacy-preserving algorithms that 
sum a series of noisy samples; for giving accuracy guarantees, it is often impor- 
tant to show that the noisy draws are probabilistically independent. We show 
that any pair of samples are independent. 

'The second program, bayes, models a simple Bayesian network with three 
independent variables x,y,z and two dependent variables w and w', computed 
from (x,y) and (y, z) respectively. We want to show that w and w’ are indepen- 
dent conditioned on any value of y; intuitively, w and w' only depend on each 
other through the value of y, and are independent otherwise. We use a constraint 
encoding similar to the encoding for showing independence to find a coupling 
proof of this fact. Note that the distributions p, w’, u” of x,y,z are unknown 
parameters, and the functions f and g are also uninterpreted. This illustrates 
the advantage of using a constraint-based technique—we can encode unknown 
distributions and operations as uninterpreted functions. 


Case Studies: Probabilistic Equalities (ballot). As we mentioned in Sect. 1, 
our approach extends naturally to proving general probabilistic equalities beyond 
uniformity and independence. To illustrate, we consider a lemma used to prove 
Bertrand's Ballot theorem [20]. Roughly speaking, this theorem considers count- 
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ing ballots one-by-one in an election where there are n4 votes cast for candidate 
A and ng votes cast for candidate B, where n4,ng are parameters. If n4 > ng 
(so A is the winner) and votes are counted in a uniformly random order, the 
Ballot theorem states that the probability that A leads throughout the whole 
counting process—without any ties—is precisely (n4 — npg)/(nA 4- np). 

One way of proving this theorem, sometimes called André's reflection princi- 
ple, is to show that the probability of counting the first vote for A and reaching 
a tie is equal to the probability of counting the first vote for B and reaching 
a tie. We simulate the counting process slightly differently—instead of drawing 
a uniform order to count the votes, our program draws uniform samples for 
votes—but the original target property is equivalent to the equality 


Prifirst; = 0 ^ tieq ^ v(zA1,zp1)] = Pr[first = 1 ^ tieg A Y(z42,£B2)] (9) 


with (a 4;,Up;) is rA; = nA ^ xp; = np. Our approach synthesizes a coupling 
and loop invariant showing that the coupled post-condition is contained in 


{(s1, s2) | si(first = 0 A tie A(xa,cp)) => so(first = 0 ^ tie ^v(xA,zp))) 


giving Formula (9) by Proposition 1 (see Barthe et al. [6] for more details). 


7 Related Work 


Probabilistic programs have been a long-standing target of formal verification. 
We compare with two of the most well-developed lines of research: probabilistic 
model checking and deductive verification via program logics or expectations. 


Probabilistic Model Checking. Model checking has proven to be a powerful 
tool for verifying probabilistic programs, capable of automated proofs for various 
probabilistic properties (typically encoded in probabilistic temporal logics); there 
are now numerous mature implementations (see, e.g., [21] or [3, Chap. 10] for 
more details). In comparison, our approach has the advantage of being fully 
constraint-based. This gives it a number of unique features: (i) it applies to 
programs with unknown inputs and variables over infinite domains; (ii) it applies 
to programs sampling from distributions with parameters, or even ones sampling 
from unknown distributions modeled as uninterpreted functions in first-order 
logic; (iii) it applies to distributions over infinite domains; and (iv) the generated 
coupling proofs are compact. At the same time, our approach is specialized to 
the coupling proof technique and is likely to be more incomplete. 


Deductive Verification. Compared to general deductive verification systems 
for probabilistic programs, like program logics [5, 14, 22, 26] or techniques reason- 
ing by pre-expectations [25], the main benefit of our technique is automation— 
deductive verification typically requires an interactive theorem prover to manip- 
ulate complex probabilistic invariants. In general, the coupling proof method lim- 
its reasoning about probabilities and distributions to just the random sampling 
commands; in the rest of the program, the proof can avoid quantitative reasoning 


Constraint-Based Synthesis of Coupling Proofs 343 


entirely. As a result, our system can work with non-probabilistic invariants and 
achieve full automation. Our approach also smoothly handles properties involv- 
ing the probabilities of multiple events, like probabilistic independence, unlike 
techniques that analyze probabilistic events one-by-one. 
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Abstract. We address the problem of synthesizing provably correct 
controllers for linear systems with reach-avoid specifications. Our solu- 
tion uses a combination of an open-loop controller and a tracking con- 
troller, thereby reducing the problem to smaller tractable problems. 
We show that, once a tracking controller is fixed, the reachable states 
from an initial neighborhood, subject to any disturbance, can be over- 
approximated by a sequence of ellipsoids, with sizes that are indepen- 
dent of the open-loop controller. Hence, the open-loop controller can 
be synthesized independently to meet the reach-avoid specification for 
an initial neighborhood. Exploiting several techniques for tightening the 
over-approximations, we reduce the open-loop controller synthesis prob- 
lem to satisfiability over quantifier-free linear real arithmetic. The overall 
synthesis algorithm, computes a tracking controller, and then iteratively 
covers the entire initial set to find open-loop controllers for initial neigh- 
borhoods. The algorithm is sound and, for a class of robust systems, is 
also complete. We present REALSYN, a tool implementing this synthesis 
algorithm, and we show that it scales to several high-dimensional systems 
with complex reach-avoid specifications. 


1 Introduction 


'The controller synthesis question asks whether an input can be generated for a 
given system (or a plant) so that it achieves a given specification. Algorithms 
for answering this question hold the promise of automating controller design. 
They have the potential to yield high-assurance systems that are correct-by- 
construction, and even negative answers to the question can convey insights 
about unrealizability of specifications. This is not a new or a solved problem, 
but there has been resurgence of interest with the rise of powerful tools and 


This work is supported by the grant CCF 1422798 from the National Science Foun- 
dation. 
© The Author(s) 2018 


H. Chockler and G. Weissenbacher (Eds.): CAV 2018, LNCS 10981, pp. 347-366, 2018. 
https://doi.org/10.1007/978-3-319-96145-3 19 


348 C. Fan et al. 


compelling applications such as vehicle path planning [11], motion control [10, 
23], circuits design [30] and various other engineering areas. 

In this paper, we study synthesis for linear, discrete-time, plant models with 
bounded disturbance—a standard view of control systems [3,17]. We will con- 
sider reach-avoid specifications which require that starting from any initial state 
O, the controller has to drive the system to a target set G, while avoiding cer- 
tain unsafe states or obstacles O. Reach-avoid specifications arise naturally in 
many domains such as autonomous and assisted driving, multi-robot coordina- 
tion, and spacecraft autonomy, and have been studied for linear, nonlinear, as 
well as stochastic models [7,9,14,18]. 

Textbook control design methods address specifications like stability, distur- 
bance rejection, asymptotic convergence, but they do not provide formal guar- 
antees about reach-avoid specifications. Another approach is based on discrete 
abstraction, where a discrete, finite-state, symbolic abstraction of the original 
control system is computed, and a discrete controller is synthesized by solving 
a two-player game on the abstracted game graph. Theoretically, these methods 
can be applied to systems with nonlinear dynamics and they can synthesize 
controllers for a general class of LTL specifications. However, in practice, the 
discretization step leads to a severe state space explosion for higher dimensional 
models. Indeed, we did not find any reported evaluation of these tools (see related 
work) on benchmarks that go beyond 5-dimensional plant models. 

In this paper, the controller we synthesize, follows a natural paradigm for 
designing controllers. The approach is to first design an open-loop controller for 
a single initial state x9 € O to meet the reach-avoid specification. This is called 
the reference trajectory. For the remaining states in the initial set, a tracking 
controller is combined, that drives these other trajectories towards the trajectory 
starting from zo. 

However, designing such a combined controller can be computationally 
expensive [32] because of the interdependency between the open-loop controller 
and the tracking controller. Our secret sauce in making this approach feasible, is 
to demonstrate that the two controllers can be synthesized in a decoupled way. 
Our strategy is as follows. We first design a tracking controller using a standard 
control-theoretical method called LQR (linear quadratic regulator) [5]. The cru- 
cial observation that helps decouple the synthesis of the tracking and open-loop 
controller, is that for such a combined controller, once the tracking controller is 
fixed, the set of states reached from the initial set is contained within a sequence 
of ellipsoidal sets [24] centered around the reference trajectory. The size of these 
ellipsoidal sets is solely dependent on the tracking controller, and is independent 
of the reference trajectory or the open-loop controller. On the flip side, the open- 
loop controller and the resulting reference trajectory can be chosen independent 
of the fixed tracking controller. Based on this, the problem of synthesizing the 
open-loop controller can be completely decoupled from synthesizing the track- 
ing controller. Our open-loop controller is synthesized by encoding the problem 
in logic. The straightforward encoding of the synthesis problem results in a 3V 
formula in the theory of linear arithmetic. Unfortunately, solving large instances 
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of such formulas using current SMT solvers is challenging. To overcome this, we 
exploit special properties of polytopes and hyper-rectangles, and reduce the orig- 
inal 3V-formula into the quantifier-free fragment of linear arithmetic (QF-LRA). 
Our overall algorithm (Algorithm 1), after computing an initial tracking con- 
troller, iteratively synthesizes open-loop controllers by solving QF-LRA formulas 
for smaller subsets that cover the initial set. The algorithm will automatically 
identify the set of initial states for which the combined tracking--open-loop con- 
troller is guaranteed to work. Our algorithm is sound (Theorem 1), and for a 
class of robust linear systems, it is also complete (Theorem 2). 

We have implemented the synthesis algorithm in a tool called REALSYN. Any 
SMT solver can be plugged-in for solving the open-loop problem; we present 
experimental results with Z3, CVC4 and Yices. We report the performance on 24 
benchmark problems (using all three solvers). Results show that our approach 
scales well for complex models—including a system with 84-dimensional dynam- 
ics, another system with 3 vehicles (12-dimensional) trying to reach a common 
goal while avoiding collision with the obstacles and each other, and yet another 
system with 10 vehicles (20 dimensional) trying to maintain a platoon. REAL- 
SYN usually finds a controller within 10 min with the fastest SMT solver. The 
closest competing tool, Tulip [13,39], does not return any result even for some 
of the simpler instances. 


Related Work. We briefly review related work on formal controller synthesis 
according to the plant model type, specifications, and approaches. 


Plants and Specifications. In increasing order of generality, the types of 
plant models that have been considered for controller synthesis are double- 
integrator models [10], linear dynamical models [20,28,34,38], piecewise affine 
models [18,40], and nonlinear (possibly switched) models [7,25,31,33]. There is 
also a line of work on synthesis approaches for stochastic plants (see [1], and 
the references therein). With the exceptions noted below, most of these papers 
consider continuous time plant models, unlike our work. 

'There are three classes of specifications typically used for synthesis. In the 
order of generality, they are: (1) pure safety or invariance specifications [2,15, 33], 
(2) reach-avoid [7,14,15,18,33], and (3) more general LTL and GR(1) [20,26,39] 
[16,38, 40]. For each of these classes both bounded and unbounded-time variants 
have been considered. 


Synthesis Tools. There is a growing set of controller synthesis algorithms 
that are available as implemented tools and libraries. This includes tools like 
CoSyMa [27], Pessoa [30], LTLMop [22,37], Tulip [13,39], SCOTS [31], that rely 
on the computation of some sort of a discrete (or symbolic) abstraction. Our 
trial with a 4-dimensional example on Tulip [13,39] did not finish the discretiza- 
tion step in one hour. LTLMop [22,37] handles GR(1) LTL specifications, which 
are more general than reach-avoid specifications considered in this paper, but it 
is designed for 2-dimensional robot models working in the Euclidean plane. An 
alternative synthesis approach generates mode switching sequences for switched 
system models [19,21,29,35,41] to meet the specifications. This line of work 
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focuses on a finite input space, instead of the infinite input space we are consid- 
ering in this paper. Abate et al. [2] use a controller template similar to the one 
considered in this paper for invariant specifications. A counter-example guided 
inductive synthesis (CEGIS) approach is used to first find a feedback controller 
for stabilizing the system. Since this feedback controller may not be safe for all 
initial states of the system, a separate verification step is employed to verify 
safety, or alternatively find a counter-example. In the latter case, the process is 
repeated until a valid controller is found. This is different from our approach, 
where any controller found needs no further verification. Several of the bench- 
marks are adopted from [2]. 


2 Preliminaries and Problem Statement 

Notation. For a set A and a finite sequence c in A*, we denote the t^^ element 
of o by ojt]. R” is the n-dimensional Euclidean space. Given a vector x € R”, 
z(i) is the i^" component of x. We will use boldfaced letters (for example, x, d, u, 
etc.,) to denote a sequence of vectors. 

For a vector x, æT is its transpose. Given an invertible matrix M € IR"*", 
llxlar = Vat MT Msz is called the M-norm of x. For M = I, |x|| yr is the familiar 
2-norm. Alternatively, |x| = ||Ma|l2. For a matrix A, A > 0 means A is 
positive definite. Given two symmetric matrices A and B, A < B means A— B is 
negative semi-definite. Given a matrix A and an invertible matrix M of the same 
dimension, there exists an & > 0 such that AT MTMA < oM" M. Intuitively, a 
is the largest scaling factor that can be achieved by the linear transformation 
from x to Ax when using M for computing the norm, and can be found as the 
largest eigenvalue of the symmetric matrix (MAM !)T(MAM ?). 

Given a vector c € IR", an invertible matrix M, and a scalar value r > 0, 
we define £,(c, M) = {x | ||x — c||u < r} to be the ellipsoid centered at c with 
radius r and shape M. B,(c) = £,(c, I) is the ball of radius r centered at c. 
Given two vectors c,v € R^, Ry(c) = {x | A? cli) — v(i) € x(i) € cli) + v(i)) 
is the rectangle centered at c with the length vector v. For a set S C R”, a 
vector v € R^, and a matrix M € R?*" we define v 6 S = {x +v | x € S) and 
M®@S={Mz |x € S}. We say aset S C R” is a polytope if there is a matrix 
A"'*" and a vector b € R™ such that S = (x | Ax < b}, and denote by vert(S) 
the set of vertices of S. 


2.1 Discrete Time Linear Control Systems 


An (n, m)-dimensional discrete-time linear system A is a 5-tuple (A, B, O, U, D), 
where (i) A € R?*" is called the dynamic matriz, (ii) B € R"*"* is called the 
input matriz, (iii) O C R” is a set of initial states (iv) U C R™ is the space of 
inputs, (v) D C R” is the space of disturbances. 

A control sequence for an (n,m)-dimensional system A is a (possibly infi- 
nite) sequence u = u(0], u[l],..., where each ult] € U. Similarly, a disturbance 
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sequence for A is a (possibly infinite) sequence d = d[0], d[1],..., where each 
d[t] € D. Given control u and disturbance d, and an initial state x[0] € ©, the 
execution of A is uniquely defined as the (possibly infinite) sequence of states 
x = x[0], x[1], ..., where for each t > 0, 


x|t + 1] = Ax[t] + Bult] + d|t]. (1) 


A (state feedback) controller for A is a function g : O x R” — R”, that maps 
an initial state and a (current) state to an input. That is, given an initial state 
xo € O and state x € R” at time t, the control input to the plant at time t is: 


ult] = g(xo, x). (2) 


This controller is allowed to use the memory of some initial state zo (not necessar- 
ily the current execution's initial state) for deciding the current state-dependent 
feedback. Thus, given an initial state x[0], a disturbance d, and a state feedback 
controller g, Eqs.(1) and (2) define a unique execution x of A. A state x is 
reachable in t-steps if there exists an execution x of A such that x[t] = x. The 
set of all reachable states from S C O in exactly T' steps using the controller g is 
denoted by Reach; (S, T). When A and g are clear from the context, we write 
Reach(S, T). 


2.2 Bounded Controller Synthesis Problem 


Given a (n, m)-dimensional discrete-time linear system A, a sequence O of obsta- 
cles or unsafe sets (with Oft] C R”, for each t), a goal G C R”, and a time bound 
T, the bounded time controller synthesis problem is to find, a state feedback con- 
troller g such that for every initial state 0 € O and disturbance d € DT, the 
unique execution x of A with g, starting from x[0] = 0 satisfies (i) for all t € T, 
u(t] € U, (ii) for all t € T, x|t] g Oft], and (iii) x[T] € G. 

For the rest of the paper, we will assume that each of the sets in {O[t] hen, 
G and U are closed polytopes. Moreover, we assume that the pair (A, B) is 
controllable [3]. 


Example. Consider a mobile robot that ¢ 
needs to reach the greem area of an apart- 
ment starting from the entrance area, while 
avoiding the gray areas (Fig. 1). The robot's 
dynamics is described by a linear model (for 3] 
example the navigation model from [12]). The F3 
obstacle sequence O here is static, that is, | aid 
Oft] = O[0] for all t > 0. Both O and G are Wo 25 50 75 100 
rectangles. Although these sets are depicted in 
2D, the dynamics of the robot may involve a 
higher dimensional state space. 

In this example, there is no disturbance, 
but a similar problem can be formulated for an drone flying outdoors, in which 
case, the disturbance input would model the effect of wind. Time-varying obstacle 
sets are useful for modeling safety requirements of multi-robot systems. 


Bedroom 2 


A5 


Living room 


Fig.1. The settings for controller 
synthesis of a mobile robot with 
reach-avoid specification. 
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3 Synthesis Algorithm 


3.1 Overview 


The controller synthesis problem requires one to find a state feedback controller 
that ensures that the trajectory starting from any initial state in O will meet 
the reach-avoid specification. Since the set of initial states O will typically be an 
infinite set, this requires the synthesized feedback controller g to have an effective 
representation. Thus, an “enumerative” representation, where a (separate) open- 
loop controller is constructed for each initial state, is not feasible — by an open- 
loop controller for initial state £o € O, we mean a control sequence u such that 
the corresponding execution x with x[0] = zo and 0 disturbance satisfies the 
reach-avoid constraints. We, therefore, need a useful template that will serve as 
the representation for the feedback controller. 

In control theory, one natural controller design paradigm is to first find a 
reference execution Xref which uses an open-loop controller, then add a tracking 
controller which tries to force other executions x starting from different initial 
states x[0] to get close to X,ef by minimizing the distance between X,ef and x. 
'This form of controller combining open-loop control with tracking control is also 
proposed in [32] for reach-avoid specifications. The resulting trajectory under a 
combination of tracking controller plus reference trajectory can be described by 
the following system of equations. 


ujt] = ued[t] + K (x[t] — Xree[t]), with (3) 
Xref|t + 1] = Axiert] + Burer|t] 


The tracking controller is given by the matrix K that determines the additive 
component of the input based on the difference between the current state and 
the reference trajectory. Once X;e¢|0] and the open-loop control sequence Urer is 
fixed, the value of Xett] is determined at each time step t € N. Therefore, the 
controller g is uniquely defined by the tuple (K, X;yer[0], Uret). We could rewrite 
the linear system in (3) as an augmented system 


Ea t+ 1] = |^ ra Ed M [t] + E H bul [t], + A [n]. 


This can be rewritten as x[t + 1] = ÁA&[t] + Balt] + d[t]. The closed-form 
solution is £[t] = Á'x[0] + 35, At-!-*( Bali] + d[i]). To synthesize a controller 
g of this form, therefore, requires finding K, X;er[0], uec such that the closed-form 
solution meets the reach-avoid specification. This is indeed the approach followed 
in [32], albeit in the continuous time setting. Observe that in the closed-form 
solution, A, a, and x[0] all depend on parameters that we need to synthesize. 
Therefore, solving such constraints involves polynomials whose degrees grow with 
the time bound. This is very expensive, and unlikely to scale to large dimensions 
and time bounds. 

In this paper, to achieve scalability, we take a slightly different approach 
than the one where K, X;e¢[0], and uec are simultaneously synthesized. We first 
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synthesize a tracking controller K, independent of X;ye[0] and Uref, using the stan- 
dard LQR method. Once K is synthesized, we show that, no matter what x,er[0], 
and Uef are, the state of the system at time t starting from zo is guaranteed to 
be contained within an ellipsoid centered at X;e¢[t] and of radius that depends 
only on K, the initial distance between xo and X;e¢[0], time t, and disturbance. 
Moreover, this radius is only a linear function of the initial distance (Lemma 1). 
Thus, if we can synthesize an open-loop controller U;yer starting from some state 
Xref|0], such that ellipsoids centered around X;e¢ satisfy the reach-avoid specifi- 
cation, we can conclude that the combined controller will work correctly for all 
initial states in some ball around the initial state x;ec[0]. The radius of the ball 
around x;er[0] for which the controller is guaranteed to work, will depend on the 
radii of the ellipsoids around x;« that satisfy the reach-avoid specification. This 
decoupled approach to synthesis is the first key idea in our algorithm. 

Following the above discussion, crucial to the success of the decoupled app- 
roach is to obtain a tight characterization of the radius of the ellipsoid around 
Xref(t] that contains the reach set, as a function of the initial distance — too 
conservative a bound will imply that the combined controller only works for a 
tiny set of initial states. The ellipsoid’s shape and direction, which is charac- 
terized by a coordinate transformation matrix M, also affect the tightness of 
the over-approximations. We determine the shape and direction of the ellipsoids 
that give us the tightest over-approximation using an SDP solver (Sect. 3.4). 

Synthesizing the tracking controller K, still leaves open the problem of syn- 
thesizing an open-loop controller for an initial state xj;er[0]. A straightforward 
encoding of the problem of synthesizing a open-loop controller, that works for 
all initial states in some ball around x;er[0], results in a 3v-formula in the the- 
ory of real arithmetic. Unfortunately solving such formulas does not scale to 
large dimensional systems using current SMT solvers. The next key idea in 
our algorithm is to simplify these constraints. By exploiting special properties 
of polytopes and hyper-rectangles, we reduce the original 4V-formula into the 
quantifier-free fragment of linear real arithmetic (QF-LRA) (Sect. 3.5). 

Putting it all together, the overall algorithm (Algorithm 1) works as follows. 
After computing an initial tracking controller K, coordinate transformation M 
for optimal ellipsoidal approximation of reach-sets, it synthesizes open-loop con- 
trollers for different initial states by solving QF-LRA formulas. After each open- 
loop controller is synthesized, the algorithm identifies the set of initial states 
for which the combined tracking+open-loop controller is guaranteed to work, 
and removes this set from O. In each new iteration, it picks a new initial state 
not covered by previous combined controllers, and the process terminates when 
all of O is covered. Our algorithm is sound (Theorem 1)—whenever a controller 
is synthesized, it meets the specifications. Further, for robust systems (defined 
later in the paper), our algorithm is guaranteed to terminate when the system 
has a combined controller for all initial states (Theorem 2). 
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3.2 Synthesizing the Tracking Controller K 


Given any open-loop controller Uef and the corresponding reference execution 
Xref, by replacing in Eq. (1) the controller of Eq. (3) we get: 


x(t + 1] = (A+ BK)x[t] — BKX ert] + Burst] + dit]. (4) 


Subtracting x;er[t 4- 1] from both sides, we have that for any execution x starting 
from the initial states x[0] and with disturbance d, the distance between x and 
Xref changes with time as: 


x[t+ 1] — xut + 1] = (A+ BK)(x[¢] — xrerlt]) + dl. (5) 


With A, = A+ BK, y[t] = x[t+ 1] — xrerlt + 1], Eq. (5) becomes y[t + 1] = 
Acy|t] + d[t]. We want x[t] to be as close to X;e¢[t] as possible, which means 
K should be designed to make |y [t]| converge to 0. Equivalently, K should be 
designed as a linear feedback controller such that A, is stable!. Such a matrix 
K can be computed using classical control theoretic methods. In this work, we 
compute K as a linear (stable) feedback controller using LQR as stated in the 
following proposition. 


Proposition 1 (LQR). For linear system A with (A, B) to be controllable and 
0 disturbance, fix any Q, R > 0 and let J = xT[T|Qx[T] + Xm (xT[i]Qx[i] + 
u'[iRRu[i] be the corresponding quadratic cost. Let X be the unique positive 
definite solution to the discrete-time Algebraic Riccati Equation (ARE): AT X A— 
X — ATXB(B'XB + R-!BTXA4 Q —0, and K 2 -(BTXB + R)-!BTXA. 
Then A+ BK is stable, and the corresponding feedback input minimizes J. 


Methods for choosing Q and R are outside the scope of this paper. We fix Q 
and R to be identity matrices for most examples. Roughly, for a given R, scaling 
up Q results in a K that makes an execution x converge faster to the reference 
execution Xref- 


3.3 Reachset Over-Approximation with Tracking Controller 


We present a method for over-approximating the reachable states of the system 
for a given tracking controller K (computed as in Proposition 1) and an open- 
loop controller u;er (to be computed in Sect. 3.5). 


Lemma 1. Consider any K € R™*", any initial set S C Ero(Xred0], M) and 
disturbance D C £5(0, M), where rg,ó > 0 and M € R"** is invertible. 
For any open-loop controller U,e and the corresponding reference execution 


Xref, 


Reach(S,t) C En (Xredt], M), V t € T, (6 


— 


where r; = aro, 030, and a > 0 is such that (A+BK)™M™M(A+BK) < 
o MT M. 


' A+ BK has spectral radius p(A + BK) < 1. 
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Lemma 1 can be proved using the triangular inequality for the norm of 
Eq.(5). From Lemma 1, it follows that given a open-loop controller ure and 
the corresponding reference trajectory Xref, the reachable states from S C 
€x, (Xier|0], M) at time t can be over-approximated by an ellipsoid centered at 
Xref|t] with size r; E arg + pom a6. Here M is any invertible matrix that 
defines the shape of the ellipsoid and it influences the value of a. As the over- 
approximation (r;) grows exponentially with t, it makes sense to choose M in a 
way that makes a small. In next section, we discuss how M and a are chosen to 
achieve this. 


3.4 Shaping Ellipsoids for Tight Over-Approximating 
Hyper-rectangles 


The choice of M and the resulting a may seem like a minor detail, but a bad 
choice here can doom the rest of the algorithm to be impractical. For exam- 
ple, if we fix M to be the identity matrix J, the resulting value of a may give 
over-approximations that are too conservative. Even if the actual executions are 
convergent to Xref the resulting over-approximation can exponentially blow up. 

We find the smallest exponential convergence/divergence rate (a) by solving 
for P in the following semi-definite program (SDP): 


min Q 
P>0,aER (7) 
st (A+ BK)™P(A+ BK) XaP. 


This gives M as the unique matrix such that P = MT M. 

In the rest of the paper, the reachset over-approximations will be represented 
by hyper-rectangles to allow us to efficiently use the existing SMT solvers. That 
is, the ellipsoids given by Lemma 1 have to be bounded by hyper-rectangles. For 
any coordinate transformation matrix M, the ellipsoid with unit size £1(0, M) C 


(0), with v(i) = a This v(i) is also computed by solving an 


SDP. Similarly, £.(0, M) C R, (0). Therefore, from Lemma 1, it follows that 
Reach(S,t) C f. (xie[t]) with r, = arg + = a26 and v is the size vector 
of the rectangle bounding €,(0, M). These optimization problems for computing 
M,a, and v have to be solved once per synthesis problem. 


Example. Continuing the previous example. 
Suppose robot is asked to reach the target 
set in 20 steps. Figure2 shows the projec- 4 
tion of the reachset om the robot's position 
with synthesized controller. The curves are 2 
the references executions Xref from 2 initials 
cover and the rectangles are reachset over- Yo 25 50 15 10.0 
approximations such that every execution of 

the system starting from each initial cover is Fig.2. Robot’s position with the 
guaranteed to be inside the rectangles at each Synthesized controllers using Algo- 
time step. rithm 1. 
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3.5 Synthesis of Open-Loop Controller 


In this section, we will discuss the synthesis of the open-loop controller uer in 
(K, xier[0], Uret). From the previous section, we know that given an initial set S, a 
tracking controller K, and an open-loop controller uyer, the reachable set (under 
any disturbance) at time t is over-approximated by Ry,u(Xrer[t]). Thus, once we 
fix K and X;e¢(0], the problem of synthesizing a controller reduces to the problem 
of synthesizing an appropriate Uef such that the reachset over-approximations 
meet the reach-avoid specification. Indeed, for the rest of the presentation, we 
will assume a fixed K. 

For synthesizing uer, we would like to formalize the problem in terms of con- 
straints that will allow us to use SMT solvers. In the following, we describe the 
details of how this problem can be formalized as a quantifier-free first order for- 
mula over the theory of reals. We will then lay out specific assumptions and/or 
simplifications required to reduce the problem to QF-LRA theory, which is imple- 
mented efficiently in existing state-of-the-art SMT solvers. Most SMT solvers 
also provide the functionality of explicit model generation, and the concrete con- 
troller values can be read-off from the models generated when the constraints 
are satisfiable. 


Constraints for Synthesizing Uef. Let us fix an initial state zy and a radius 
r, defining a set of initial states S = B, (xo). The u;er synthesis problem can be 
stated as finding satisfying solutions for the formula ¢synth(Zo, r). 

2 Furer[0], Urer[1],..- Urer[T—1], 
Xref [0], Xret[L], . . - Xrer[T], (8) 
Qcontrol (rer) A execution (Uref, Xref, LO) 

A avoid (L0, T, Uref, Xref) ^ Preach (0,7; Uref, Xref) 


Psynth (xo, r) 


LI LI 


where contro! constrains the space of inputs, execution States that the sequence Xref 
is a reference execution following Eq. (3), davoia specifies the safety constraint, 
Preach Specifies that the system reaches G: 


T-1 
control (Uref ) E ^ Uref|t] ($>) (K ® Rrzv(0)) C U 
t=0 


T-1 
Pexecution (uer, Xref, zo) à (Xret[0] == zo) ^ ^ (Xie [E t 1] = AXrer|t] + Burer[t]) 
t=0 


(9) 
T 

avoid (Xo, T, Uref, Xref) 2 A Rrzv(Xret[t]) n Off] =Ø 
t=0 


Preach (£0, T, Uref, Xref) 4 Rerpv(Xret[T]) Cc G. 


As discussed in Sect. 3.2, the vector v and the constants ro,..., rp are pre- 
computed using the radius r of the initial ball. 

We make a few remarks about this formulation. First, each of the formulas 
control; avoid ANd dreach represent sufficient conditions to check for the existence 
of ue. Second, the constraints stated above belong to the (decidable) theory 
of reals. However, @control; Pavoid and $reach, and thus @synth, are not quantifier 
free as they use subset and disjointness checks. This is because for sets S,T' 
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expressed as predicates ys(-) and yr(-), SMT = 9 corresponds to the formula 
Vz-—(eQs(x) A yr(x)) and S C T (or equivalently SM T^ = Ø) corresponds to 
the formula Vr: ys(x) => yr(z). 


Reduction to QF-LRA. Since the sets G and U are bounded polytopes, G* and 
U* can be expressed as finite unions of (possibly unbounded) polytopes. Thus, 
the subset predicates Urer[t] ® (K & &,,,(0)) € U in Pcontrot and Rr, v(Xret[t]) € G 
in ¢reach Can be expressed as a disjunction over finitely many predicates, each 
expressing the disjointness of two polytopes. 

The central idea behind eliminating the universal quantification in the dis- 
jointness predicates in @ayoiq or in the inferred disjointness predicates in reach 
and @control; is to find a separating hyperplane that witnesses the disjointness 
of two polytopes. Let P, = {a | Aix € bı} and P» = (x | Agu € b2} be two 
polytopes such that P, is closed and bounded. Then, if there is an i for which 
each vertex v of Pj satisfies Av > b»(i), we must have that P, N P» = Ø, where 


AQ is the it! row vector of the matrix A2. That is, such a check is sufficient to 
ensure disjointness. Thus, in the formula $ayoig, in order to check if Ry,» (Xrer[t]) 
does not intersect with Oft], we check if there is a face of the polytope Oft] 
such that all the vertices of Rr,v(Xreflt]) lie on the other side of the face. The 
same holds for each of the inferred predicates in @reach and contro. Eliminating 
quantifiers is essential to scale our analysis to large high dimensional systems. 

Further, when the set G has a hyper-rectangle representation, the contain- 
ment check *,,.(x««[T] C G can directly be encoded as the conjunction of 
O(n) linear inequalities, stating that for each dimension i, the lower and the 
upper bounds of Rr,v(Xretlt]) in the it” dimension, satisfy lf < l; < ui < ul, 
where l} and r; represent the bounds for G in the it? dimension. Similarly, when 
Oft] has a rectangle representation, we can formulate the emptiness constraint 

TL 
Ro v(xerdt]) n O[t] = Ø as V (uj < EVI; > ul), where l; and u; (resp. l; and u}) 
i=1 
are the lower and upper bounds of Ry,» (Xref[t]) (resp. O[t]) in the it dimension. 
Since such simplifications can exponentially reduce the number of constraints 
generated, they play a crucial for the scalability. 

The constraints for checking emptiness and disjointness, as discussed above, 
only give rise to linear constraints, do not have the V quantification over states, 
and is a sound transformation of @synth into QF-LRA. In Sect.3.6 we will see 
that the reach set over-approximation can be made arbitrarily small when the 
disturbance is 0 by arbitrarily shrinking the size of the initial cover. Thus, these 
checks will also turn out to be sufficient to ensure that if there exists a controller, 
synth is satisfiable. 


Lemma 2. Let v € R” and ro,...,rpr € R be such that for any execution Xref 
starting at zo, we have Vt € T - Reach(B,.(xo),t) C Rr,u(Xredt]). If the formula 
Psynth(Zo, r) is satisfiable, then there is a control sequence u,er such that for every 
x € B,(ao) and for every d € DT, the unique execution x defined by the controller 
(K, Xo, ure) and d, starting at x satisfies x[T] € GAVt < T - x|t] g Off]. 
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We remark that a possible alternative for eliminating the V quantifier is the 
use of Farkas’ Lemma, but this gives rise to nonlinear constraints”. Indeed, in 
our experimental evaluation, we observed the downside of resorting to Farkas’ 
Lemma in this problem. 


3.6 Synthesis Algorithm Putting It All Together 


The presentation in Sect. 3.5 describes how to formalize constraints to generate a 
control sequence that works for a subset of the initial set O. The overall synthesis 
procedure (Algorithm 1), first computes a tracking controller K, then generates 
open-loop control sequences and reference executions in order to cover the entire 
set O. 


Algorithm 1. Algorithm for Synthesizing Combined Controller 
: Input: A, T, O[0],. O[T],G,Q,R 

: r* — diameter(O ya 

K,v,c1,€2 — BLOATPARAMS(A, T, Q, R) 

cover + Ø 


: controllers — Ø 

while O Z cover do 
Weynth — GETCONSTRAINTS(A, T, O[0), ..., OT], G, v, c1, co, r*, cover) 
if CHECKSAT(t)synth) = SAT then 

9: T, Uref, Xref — MODEL (synth) 

10: cover — cover U B, (Xref[0]) 

11: controllers — controllers U { ( (K, Xret[0], Uret) , Br (Xret[0]) ) } 

12: else 

13: r* — r* [2 


14: return controllers; 


BAS Que Co po ra 


The procedure BLOATPARAMS, computes a tracking controller K, a vector 
v and real valued parameters 1ci[t] ier, {ca[t]}i<7, for the system A and time 
bound T with Q, R for the LQR method. Given any reference execution Xref and 
an initial set B,.(x;er[0]), the parameters computed by BLOATPARAMS can be used 
to over-approximate Reach(B,(X;er[0]), t) with the rectangle Rw (Xrer[t]), where 
v' = (e[t]r + c» [t]) v. The computation of these parameters proceeds as follows. 
Matrix K is determined using LQR (Proposition 1). Now we use Equation (7) 
to compute the matrix M and the rate of convergence o. Vector v is then com- 
puted such that £1(0, M) is bounded by R,(0). Let runt = maxzes,() lella 
and ô = maxgep ||d||u. Then we have, B,(xo) € Er-runl(£o, M) for any zo. 
The constants ci[0], .. . c1[T], ea[0], . . . c2[T] are computed as ci[t] = ars and 
co|t] = a $036; Sects. 3.2-3.4 establish the correctness guarantees of these 


? Farkas’ Lemma introduces auxiliary variables that get multiplied with existing vari- 
ables Xref[0], . . . , x:e«[7], leading to nonlinear constraints. 
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parameters. Clearly, these computations are independent of any reference exe- 
cutions Xref and control sequences Uref- 

The procedure GETCONSTRAINTS constructs the logical formula Weynth below 
such that whenever Weynth holds, we can find an initial radius r, and center xp in 
the set ON cover and a control sequence uef such that any controlled execution 
starting from B,.(xo) satisfies the reach-avoid requirements. 


synth 2 Jro Sr - (vo € O A To Ẹ cover Ar » r* ^ don (to, r)) (10) 


Recall that the constants ro,...,rr used in @synth are affine functions of r and 
thus synth falls in the QF-LRA fragment. 

Line 8 checks for the satisfiability of Wsynth. If satisfiable, we extract the model 
generated to get the radius of the initial ball, the control sequence uef and 
the reference execution xj, in Line 9. The generated controller (K, x;er[0], Urer) 
is guaranteed to work for the ball B.(x;«[0]), which can be marked covered 
by adding it to the set cover. In order to keep all the constraints linear, one 
can further underapproximate B,(x;«[0]) with the rectangle Ru (Xretl0]), where 
w(i) = r/4/n for each dimension i < n. If synth is unsatisfiable, then we reduce 
the minimum radius r* (Line 13) and continue to look for controllers, until we 
find that O C cover. 

The set controllers is the set of pairs ((K, £o, Urer), S), such that the con- 
troller (K, zo, ujer) drives the set S to meet the desired specification. Each time 
a new controller is found, it is added to the set controllers together with 
the initial set for which it works (Line 11). The following theorem asserts the 
soundness of Algorithm 1, and it follows from Lemmas 1 and 2. 


Theorem 1. /f Algorithm 1 terminates, then the synthesized controller is cor- 
rect. That is, (a) for each x € O, there is a ((KK, xo, ue), S) € controllers, 
such that x € S, and (b) for each ((K,xo, Uret, S) € controllers, the unique 
controller (K, £o, UA is such that for every x € S and for every d € DT, 
the unique execution defined by (K, xo, Uret) and d, starting at x, satisfies the 
reach-avoid specification. 


Algorithm 1 ensures that, upon termination, every x € O is covered, i.e., 
one can construct a combined controller that drives z to G while avoiding O. 
However it may find multiple controllers for a point z € O. This non-determinism 
can be easily resolved by picking any controller assigned for x. 

Below, we show that, under certain robustness assumptions on the system 
A, G and the sets O, and in the absence of disturbance Algorithm 1 terminates. 


Robustly Controllable Systems. A system A = (A, B,O,U, D) is said to 
be &-robustly controllable (e > 0) with respect to the reach-avoid specifica- 
tion (O, G) and matrix K, if (a) D = {0}, and (b) for every initial state 
0 € © and for every open loop-controller Uef € UT such that the unique execu- 
tion starting from 0 using the open-loop controller U;er satisfies the reach-avoid 
specification, then with the controller (K, 0, uef) defined as in Equation (3), 
Vt < T,Reach(Bz(0),t) N O[t] = @ and Reach(B.(0), T) C G, i.e., Vz € Be(0), 
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the unique trajectory x defined by the controller (K, 0, uer) starting from x also 
satisfies the reach avoid specification. 


Theorem 2. Let A be ¢-robust with respect to the reach-avoid specification 
(O, G) and K, for some £ > 0. If there is a controller for A that satisfies the 
reach-avoid specification, then Algorithm 1 terminates. 


When the system is robust, then (in the absence of any disturbance i.e., D — 
(03), the sizes ro, r1, ..., rr of the hyper-rectangles that overapproximate reach- 
sets go arbitrarily close to 0 as the initial cover converges to a single point (as seen 
in Lemma 1). Therefore, the over-approximations can be made arbitrarily precise 
as r* decreases. Moreover, as r* approaches 0, Eq. (9) (with simplifications for 
QF-LRA), also becomes satisfiable whenever there is a controller. The correctness 
of Theorem 2 follows from both these observations. 


4 RealSyn Implementation and Evaluation 


4.1 Implementation 


We have implemented our synthesis algorithm in a tool called REALSYN. REAL- 
SYN is written in Python. For solving Eq. (10) it can interface with any SMT 
solver through Python APIs. We present experimental results with Z3 (version 
4.5.1) [6], Yices (version 2.5.4) [8], and CVC4 (version 1.5) [4]. REALSYN leverages 
the incremental solving capabilities of these solvers as follows: The constraints 
Wsynth generated (line 8 in Algorithm 1) can be expressed as Jao, dr - V1 ^ v», 
where y, = synth (9, r) and v» = to € OATo € coverAr > r*. Since the bulk of 
the formula ¢synth(Zo, r) is in V1 and it does not change across iterations, we can 
generate this formula only once, and push it on the context stack of the solvers. 
The formula Yə is different across iterations, and can be pushed and popped 
out of the stack as required. This minimizes the time taken for generation of 
constraints. 


4.2 Evaluation 


We use 24 benchmark examples? to evaluate the performance of REALSYN with 
three different solvers on a standard laptop with Intel® Core™ i7 processor, 
16GB RAM, running Ubuntu 16.04. The results are reported in Table 1. The 
results are encouraging and demonstrate the effectiveness of using our approach 
and the feasibility of scalable controller synthesis for high dimensional systems 
and complex reach-avoid specifications. 


Comparison With Other Tools. We considered other controller synthe- 
sis tools for possible comparison with REALSYN. In summary, CoSyMa [27], 
Pessoa [30], and SCOTS [31] do not explicitly support discrete-time sytems. 
LTLMop [22,37] is designed to analyze robotic systems in the (2-dimensional) 


3 The examples are available at https:/ /github.com/umangm/realsyn. 
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Table 1. Controller synthesis using REALSYN and different SMT solvers. An expla- 
nation for the * marked entries can be found in Sect. 4. 


Model n im |Z3 CVCA Yices 

Hiter time(s) #iter|time(s) | #iter time(s) 
1 |1-robot 2| 149 0.21 1 0.06 |7 0.06 
2 2-robot 4| 2 |164 12.62 11 [0.31 |183 |2.26 
3 running-example 4| 2 N/A T/O N/A T/O 1 319.97 
4 1-car dynamic avoid | 4 | 2 |9 53.17 1 96.43 |12  |8.49 
5 | l-car navigation 4/2118 7.49 1 3.05 17 16.73 
6 2-car navigation 8| 4/1 60.14 1 2668.2 |1 4.07 
T |3-car navigation 12/61 133.42 1 481.88 |1 741.73 
8 4-car platoon 8| 41 0.37 1 0.21 1 0.15 
9 8-car platoon 16/811 23.02 1 1.44 1 0.62 
10 |10-car platoon 20 |10 1 459.36 1 20.93 |1 7.74 
11 | Example 3| 1/82 2.32 18 0.10 |67 (0.43 
12 Cruise 1) 1)1 0.06 I 0.03 1 0.02 
13 Motor 2) 141 0.10 1 0.00 |1 0.03 
14 | Helicopter 3| 1/81 2.31 13 0.08 |70 40.38 
15 Magnetic suspension, 2| 1 |39 0.47 2 0.05  |39 40.08 
16 Pendulum 2| 1 |30 0.32 8 0.05 42 |0.07 
17 |Satellite 2| 1/40 0.46 5 0.05 |32  /0.06 
18 Suspension 4| 11 0.17 1 0.11 1 0.09 
19 Tape 3| 11 0.12 1 0.07 1 0.07 
20 Inverted pendulum | 2, 1 |39 0.49 2 0.05 |39 |0.09 
21 Magnetic pointer 3| 1/44 1.12 12 |0.08 |134 |0.83 
22 Helicopter 28 | 6 |N/A (1*)|T/O (650*) 1 651.21 |N/A |T/O 
23 Building 48 | 1 1 (1*) 1936.03 (240*)|N/A [T/O 1 552.48 
24 Pde 84 | 1 |N/A (1*)| T/O (1800*) 1 8.48 |l 8.87 


Euclidean plane and thus not suitable for most of our examples. TuLiP [13,39] 
comes closest to addressing the same class of problems. TuLip relies on dis- 
cretization of the state space and a receding horizon approach for synthesizing 
controllers for more general GR(1) specifications. However, we found TuLip suc- 
cumbs to the state space explosion problem when discretizing the state space, 
and it did not work on most of our examples. For instance, TuLiP was unable 
to synthesize a controller for the 2-dimensional system '1-robot/ (Table 1), and 
returned unrealizable. On the benchmark ‘2-robot’ (n = 4), TuLip did not 
return any answer within 1 h. We checked these findings with the developers and 
they concurred that it is typical for TuLip to take hours even for 4-dimensional 
systems. 
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Benchmarks. Our benchmarks and their SMT encodings, could be of inde- 
pendent interest to the verification and SMT-community. Examples 1-10 are 
vehicle motion planning examples we have designed with reach-avoid specifi- 
cations. Benchmarks 1-2 model robots moving on the Euclidean plane, where 
each robot is a 2-dimensional system and admits a 1-dimensional input. Start- 
ing from some initial region on the plane, the robots are required to reach the 
common goal area within the given time steps, while avoiding certain obstacles. 
For ‘2-robot’, the robots are also required to maintain a minimum separation. 
Benchmarks 3-7 are discrete vehicular models adopted from [12]. Each vehicle is 
a 4-dimensional system with 2-dimensional input. Benchmark 3 is the system as 
our running example. Benchmark 4 describes one ego vehicle running on a two- 
lane road, trying to overtake a vehicle in front of it. The second vehicle serves as 
the obstacle. Benchmarks 5-7 are similar to Benchmark 2 where the vehicles are 
required to reach a common goal area while avoiding collision with the obstacles 
and with each other (inspired by a merge). The velocities and accelerations of 
the vehicles are also constrained in each of these benchmarks. 

Benchmarks 8-10 model multiple vehicles trying to form a platoon by main- 
taining the safe relative distance between consecutive vehicles. The models are 
adopted (and discretized) from [32]. Each vehicle is a 2-dimensional system with 
1-dimensional input. For the 4-car platoon model, the running times reported in 
Table 1 are much smaller than the time (5 min) reported in [32]. This observation 
aligns with our analysis in Sect. 3.1. 

Benchmarks 11-21 are from [2]. The specification here is that the reach set 
has to be within a safe rectangle (that is, G = true). In [2] each model is 
discretized using 8 different time steps and here we randomly pick one for each 
model. In general, the running time of REALSYN is less than those reported 
in [2] (their reported machine had better configuration). On the other hand, the 
synthesized controller from [2] considers quantization errors, while our approach 
does not provide any guarantee for that. 

Benchmarks 22-24 are a set of high dimensional examples adopted and dis- 
cretized from [36]. Similar to previous ones, the only specification is that the 
reach sets starting from an initial state with the controller should be contained 
within a safe rectangle. 


Synthesis Performance. In Table 1, columns ‘n’ and ‘m’ stand for the dimen- 
sions of the state space and input space. For each background solver, ‘#iter’ is the 
number of iterations Algorithm 1 required to synthesize a controller, and ‘time’ 
is the respective running times. We specify a time limit of 1h and report T/O 
(timeout) for benchmarks that do not finish within this limit. All benchmarks 
are synthesized for a specification with 10-20 steps. 

In general, for low-dimensional systems (for example, in Benchmarks 11- 
21), each of the solvers finish quickly (in less than 1s), with CVC4 and Yices 
outperforming Z3 on most benchmarks. The Yices solver is faster than the other 
two on most examples. Z3 was the slowest on most, except a few (e.g., Benchmark 
3, 6) where CVC4 was much slower. The running time, in general, increases with 
the increase of the dimensionality but this relationship is far from simple. For 
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example, the 84-dimensional Benchmark 24 was synthesized in less than 9s by 
both CVC4 and Yices, possibly because the safety specification is rather simple 
for this problem. 

The three solvers use different techniques for solving QF-LRA formulae with 
support for incremental solving. The default tactic in Z3 is such that it spends 
a large chunk of time when a constraint is pushed to the solver stack. In fact, 
for Benchmark 24, while the other two solvers finish within 9s, Z3 did not 
finish pushing the constraints in the solver stack. When we disable incremental 
solving in Z3, the Benchmarks 22, 23 and 24 finish in about 650, 240 and 1800s 
respectively (marked with *). The number of iterations widely vary across solvers, 
with CVC4 usually finishing in the fewest number of iterations. Despite the larger 
number of satisfiability queries, Yices manages to finish close to CVC4 on most 
examples. 


5 Conclusion 


We proposed a novel technique for synthesizing controllers for systems with dis- 
crete time linear dynamics, operating under bounded disturbances,and for reach- 
avoid specifications. Our approach relies on generating controllers that combine 
an open loop-controller with a tracking controller, thereby allowing a decoupled 
approach for synthesizing each component independently. Experimental evalu- 
ation using our tool REALSYN demonstrates the value of the approach when 
analyzing systems with complex dynamics and specifications. 

There are several avenues for future work. This includes synthesis of com- 
bined controllers for nonlinear dynamical and hybrid systems, and for more 
general temporal logic specifications. Generating witnesses to show the absence 
of controllers is also an interesting direction. 
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Abstract. Asynchronous interactions are ubiquitous in computing sys- 
tems and complicate design and programming. Automatic construc- 
tion of asynchronous programs from specifications (“synthesis”) could 
ease the difficulty, but known methods are complex, and intractable 
in practice. This work develops substantially simpler synthesis meth- 
ods. A direct, exponentially more compact automaton construction 
is formulated for the reduction of asynchronous to synchronous syn- 
thesis. Experiments with a prototype implementation of the new 
method demonstrate feasibility. Furthermore, it is shown that for sev- 
eral useful classes of temporal properties, automaton-based methods 
can be avoided altogether and replaced with simpler Boolean constraint 
solving. 


1 Introduction 


Modern software and hardware systems harness asynchronous interactions to 
improve speed, responsiveness, and power consumption: delay-insensitive cir- 
cuits, networks of sensors, multi-threaded programs and interacting web services 
are all asynchronous in nature. Various factors contribute to asynchrony, such 
as unpredictable transmission delays, concurrency, distributed execution, and 
parallelism. The common result is that each component of a system operates 
with partial, out-of-date knowledge of the state of the others, which consid- 
erably complicates system design and programming. Yet, it is often easier to 
state the desired behavior of an asynchronous program. We therefore consider 
the question of automatically constructing (i.e., synthesizing) a correct reactive 
asynchronous program directly from its temporal specification. 

'The asynchronous synthesis problem was originally formulated by Pnueli and 
Rosner in 1989 on the heels of their work on synchronous synthesis [31,32]. 
The task is that of constructing a (finite-state) program which interacts asyn- 
chronously with its environment while meeting a temporal specification on the 
actions at the interface between program and environment. Given a linear tem- 
poral specification y, Pnueli-Rosner show that asynchronous synthesis can be 
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reduced to checking whether a derived specification y’, specifying the required 
behavior of the scheduler, is synchronously synthesizable. That is, an asyn- 
chronous program can implement q iff a synchronous program can implement y’. 

It may then appear straightforward to construct asynchronous programs 
using one of the many tools that exist for synchronous synthesis. However, the 
derived formula y’ embeds a nontrivial stutter quantification, which requires a 
complex intermediate automaton construction; it has not, to the authors’ knowl- 
edge, ever been implemented. This situation is in stark contrast to that of syn- 
chronous synthesis, for which multiple tools and algorithms have been created. 

Alternative methods have been proposed for asynchronous synthesis: 
Finkbeiner and Schewe reduce a bounded form of the problem to a SAT/SMT 
query [35], and Klein, Piterman and Pnueli show that some GR(1) specifica- 
tions! can be transformed as above to an approximate synchronous GR(1) prop- 
erty [21,22]. These alternatives, however, have drawbacks of their own. The 
SAT/SMT reduction is exponential in the number of interface (input and out- 
put) bits, an important parameter; the GR(1) specifications amenable to trans- 
formation are limited and are characterized by semantic conditions that are not 
easily checked. 

This work presents two key simplifications. First, we define a new property, 
PR(y) (named in honor of Pnueli-Rosner’s pioneering work) which, like y’, is 
synchronously realizable if, and only if, y is asynchronously realizable. We then 
present an automaton construction for PR(y) that is direct and simpler, and 
results in an exponentially smaller automaton than the one for vy’. In particular, 
the automaton for PR(y) has only at most twice the states of the automaton 
for p, as opposed to the exponential blowup of the state space (in the number of 
interface bits) incurred in the construction of the automaton for y’. As almost 
all synchronous automaton-based synthesis tools use an explicit encoding for 
automaton states, this reduction is vital in practice. 

We show how to implement the transformation PR symbolically (with BDDs), 
so that interface bits are always represented in symbolic form. One can then 
apply the modular strategy of Pnueli-Rosner: a symbolic automaton for q is 
transformed to a symbolic automaton for PR(y) (instead of y’), which is ana- 
lyzed with a synchronous synthesis tool. We establish that PR is conjunctive 
and preserves safety?. These are important properties, used by tools such as 
Acacia4- [8] and Unbeast [11] to optimize the synchronous synthesis task. The 
new construction has been implemented in a prototype tool, BAS, and experi- 
ments demonstrate feasibility in practice. 

In addition, we establish that for several classes of temporal properties, which 
are easily characterized by syntax, the automaton-based method can be avoided 
entirely and replaced with Boolean constraint solving. The constraints are quan- 
tified Boolean formulae, with prefix JV and a kernel that is derived from the 
original specification. This surprising reduction, which resolves a temporal prob- 


! The GR(1) (“General Reactivity (1)") subclass has an efficient symbolic procedure 
for synchronous synthesis, formulated in [28] and implemented in several tools. 
? Le., PR(A, fi) = A; PR(fi), and PR(f) is a safety property if f is a safety property. 
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lem with Boolean reasoning, is a consequence of the highly adversarial role of 
the environment in the asynchronous setting. 

These contributions turn a seemingly intractable synthesis task into one that 
is feasible in practice. 


2 Preliminaries 


Temporal Specifications. Linear Temporal Logic (LTL) [29] extends propo- 
sitional logic with temporal operators. LTL formulae are defined as y:: = 
True | False | p | ^e | gi A y2 | Xe | pU p| Op | Oy | Bie. Here p isa 
proposition, and X(Next), U (Until), > (Eventually), O (Always), and B(Always 
in the past) are temporal operators. The LTL semantics is standard, and is in 
the full version of the paper. For an LTL formula y, let L(y) denote the set of 
words (over subsets of propositions) that satisfy vy. 

GR(1) is a useful fragment of LTL, where formulae are of the form (OS, ^ 
Np OOP) = (BS, ^ No D0Q;), for propositional formulae Se, Ss, Pi, Qi. 
Typically, the left-hand ade of the implication is used to restrict the environ- 
ment, by requiring safety and liveness assumptions to hold, while the right-hand 
side is used to define the safety and liveness guarantees required of the system. 

LTL specifications can be turned into equivalent Büchi automata, using 
standard constructions. A Biichi automaton, A, is specified by the tuple 
(Q, Qo, 2,0, G), where Q is a set of states, Qo C Q defines the initial states, 
X is the alphabet, 6 C Q x X x Q is the transition relation, and G C Q 
defines the “green” (also known as “accepting” or “final”) states. A run r of 
the automaton on an infinite word c = ao,a1,... over X is an infinite sequence 
r = qo, 09, q1, 01; ... Such that qo is an initial state, and for each k, (qk, ag, qdk+1) 
is in the transition relation. Run r is accepting if a green state appears on it 
infinitely often; the language of A, denoted £(A), is the set of words that have 
an accepting run. 


The Asynchronous Synthesis Model. The goal of synthesis is to construct 
an “open” program M meeting a specification at its interface. In the asyn- 
chronous setting, the program M interacts in a fair interleaved manner with its 
environment E. The fairness restriction requires that E and M are each sched- 
uled infinitely often in all infinite executions. Let E//M denote this composition. 
The interface between E and M is formed by the variables x and y. Variable x 
is written by E and is read-only for M, while y is written by M and is read- 
only for E. One can consider x (resp., y) to represent a vector of variables, i.e., 
z = (z1,..., 2n) (resp., y = (Y1,---;Ym)) which is read (resp., written) atomi- 
cally. Many of our results also extend to non-atomic reads and writes, and are 
discussed in the full version of the paper. 

The synthesis task is to construct a program M which satisfies a temporal 
property v(x, y) over the interface variables in the composition E//M, for any 
environment E. The most adversarial environment is the one which sets x to an 
arbitrary value at each scheduled step, we denote it by CHAOS(z). The behaviors 
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of the composition CHAOS(x)//M simulate those of E//M for all E. Hence, it 
suffices to produce M which satisfies y in the composition CHAOS(z)//M. One 
can limit the set of environments through an assumption in the specification. 

'This leads to the formal definition of an asynchronous schedule, given by a 
pair of functions, r,w : N — N, which represent read and write points, respec- 
tively. The initial write point, w(0) — 0, and represents the choice of initial value 
for the variable y. Without loss of generality, the read-write points alternate, i.e., 
for all i > 0, w(i) € r(i) < w(i--1) and r(i) < w(i--1) € r(i4- 1). A strict asyn- 
chronous schedule does not allow read and write points to overlap, i.e., the con- 
straints are strengthened to w(i) < r(i) < w(i+1) and r(i) < w(i4-1) < r(i—- 1). 
A tight asynchronous schedule is the strict schedule without any non-read-write 
gaps, i.e., r(k) = 2k + 1 and w(k) = 2k, for all k. A synchronous schedule is the 
special non-strict schedule where r(i) = i and w(i) = i, for all i. 

Let D" denote the binary domain (True, False] for a variable v. A program 
M can be represented semantically as a function f : (D*)* — DY. For an 
asynchronous schedule (r,w), a sequence c. = (D* x DV)" is said to be an 
asynchronous execution of f over (r,w) if the value of y is changed only at 
writing points, in a manner that depends only on the values of z at prior reading 
points. Formally, for all i > 0, yu(iz1) = f(@r(o) ... 2, (;)), and for all j such that 
w(i) <j<w(t + 1), yj—yw(i). The initial value of y is the value it has at point 
w(0) = 0. The set of such sequences is denoted as asynch( f). Over synchronous 
schedules, the set of such sequences is denoted by synch(f). Function f is an 
asynchronous implementation of y if all asynchronous executions of f over all 
possible schedules satisfy y, i.e., if asynch( f) C L(y). 

This formulation agrees with that given by Pnueli and Rosner for strict sched- 
ules. For synchronous schedules (and other non-strict schedules), our formulation 
has a Moore-style semantics — the output depends on strictly earlier inputs — 
while Pnueli and Rosner formulate a Mealy semantics. A Moore semantics is 
more appropriate for modeling software programs, where the output variable is 
part of the state, and fits well with the theoretical constructions that follow. 


Definition 1 (Asynchronous LTL Realizability). Given an LTL property 
c(z,y) over the input variable x and output variable y, the asynchronous LTL 
realizability problem is to determine whether there is an asynchronous imple- 
mentation for p. 


Definition 2 (Asynchronous LTL Synthesis). Given a realizable LTL- 
formula , the asynchronous LTL synthesis problem is to construct an asyn- 
chronous implementation of q. 


Examples. Pnueli and Rosner give a number of interesting specifications. The 
specification O (y= Xx) (“the current output equals the next input") is satisfi- 
able but not realizable, as any implementation would have to be clairvoyant. On 
the other hand, the flipped specification O (x = X y) (“the next output equals the 
current input") is synchronously realizable by a Moore machine which replays 
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the current input as the next output. The specification 00 a2= Uy is syn- 
chronously realizable by the same machine, but is asynchronously unrealizable, 
as shown next. Consider two input (x) sequences, under a schedule where reads 
happen only at odd positions. In both, let x=true at all reading points. Then any 
program must respond to both inputs with the same output sequence for y. Now 
suppose that in the first sequence x is false at all non-read positions, while in 
the second, z is true at all non-read positions. In the first case, the specification 
forces the output y-sequence to be false infinitely often; in the second, y is forced 
to be true from some point on, a contradiction. 

The negated specification 9 O xzOLly is also asynchronously unrealizable, 
for the same reason. This “gap” illustrates an intriguing difference from the 
synchronous case, where either a specification is realizable for the system, or its 
negation is realizable for the environment. The two halves of the equivalence, 
i.e., OLIz OL1y and OUI y 2 OLI z are individually asynchronously realizable, 
by strategies that fix the output to y=true and to y=false, respectively. 


From Asynchronous to Synchronous Synthesis. Pnueli and Rosner 
reduced asynchronous LTL synthesis to synchronous synthesis of Büchi objec- 
tives. Their reduction applied to LTL formulas with a single input and out- 
put variable [32]; it was later extended to the non-atomic case [30]. The orig- 
inal Rosner-Pnueli reduction deals exclusively with strict schedules, since they 
showed that it is sufficient to consider only strict schedules. 

Two infinite sequences are said to be stuttering equivalent if one sequence 
can be obtained from the other by a finite duplication ("stretching") of a 
given state or by deletion (“compressing”) of finitely many contiguous identi- 
cal states retaining at least one of them. The stuttering quantification J^ is 
defined as follows: J^'z. holds for sequence 7 if dx. holds for a sequence 7’ 
that is stuttering equivalent to 7. Pnueli-Rosner showed that an LTL-formula 
(x,y) over input x and output y is asynchronously realizable iff a “kernel” 
formula (this is the precise formula referred to as y! in the Introduction) 
K(r,w,z,y) = a(r,w) > B(r,w,2z,y) over read sequence r, write sequence w, 
input sequence x and output sequence y is synchronously realizable: 


a(r,w) = (^r ^2wUr)ALD-(r^w)A^U(r => (rU (ar) Uw)) 

AD (w => (wU (-w) Ur)) 

B(r,w,z,y) = (x,y) AVa.O ((y=a) = ((y = a) U (~w A (y= a) U w))) 
AV* a! (L3 (^r r U (x = 2’)) plz’, y)) 


Here, a encodes the strict scheduling constraints on read and write points, while 
8 encodes conditions which assure a correct asynchronous execution over (r, w). 
The V* quantification, intuitively, quantifies over all adversarial schedules similar 
to the current (r, w): it requires y to hold over all sequences obtained from the 
current sequence c by stretching or compressing the segments between read and 
write points, and choosing different values for x on those segments. 
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3 Symbolic Asynchronous Synthesis 


Pnueli and Rosner’s procedure for asynchronous synthesis [32] is as follows: first, 
a Büchi automaton is built for the kernel formula =X. This automaton is then 
determinized and complemented to form a deterministic word automaton for K, 
which is then re-interpreted as a tree automaton and tested for non-emptiness. 
The transformations use standard constructions, except for the interpretation 
of the J^ operator in the formation of the Büchi automaton for 7K. For a 
Büchi automaton A, an automaton for I™ L(A) is constructed in two steps: 
first applying a “stretching” transformation on A, followed by a “compressing” 
transformation. Stretching introduces new automaton states of the form (q, a), 
for each state q of A and each letter a. 

When this general construction is applied to the formula ~K, the alphabet 
of the automaton A is formed of all possible valuations of the pair of variables 
(x,y), which has size exponential in the number of interface bits. The stretching 
step introduces a copy of an automaton state for each letter, which results in an 
exponential blow-up of the state space of the constructed automaton. As all cur- 
rent tools for synchronous synthesis represent automaton states explicitly’, the 
exponential blowup introduced by the stuttering quantification is a significant 
obstacle to implementation. 

In Pnueli-Rosner’s construction, the determinization and complementation 
steps are also complex, utilizing Safra’s construction. These steps are simplified 
by the “Safraless” procedure adopted in current tools for synchronous synthesis. 

The other major issue with the Pnueli-Rosner construction is that the kernel 
formula K introduces the scheduling variables r, w as input variables. However, 
the actions of a synthesized program should not rely on the values of these 
variables. Pnueli-Rosner ensure this by checking satisfiability over “canonical” 
tree models; it is unclear, however, how to realize this effect using a synchronous 
synthesis tool as a black box. 

We define a new property, PR(v), that differs from K but, similarly, is syn- 
chronously realizable if, and only if, p is asynchronously realizable. We then 
present an automaton construction for PR(y) that bypasses the general con- 
struction for 3^, avoiding the exponential blowup and resulting in an automa- 
ton with at most twice the states of the original. Moreover, this construction 
refers only to x and y, avoiding the second issue as well. We then show that this 
construction can be implemented fully symbolically. 


3.1 Basic Formulations and Properties 


As formulated in Sect.2, an asynchronous execution of f is determined by the 
schedule (r, w). For a strict schedule, any infinite sequence representing an asyn- 
chronous behavior of f over (r, w) may be partitioned into a sequence of blocks, 
as follows. The start of the ith block is at the ?'th writing point, w(i), and it 


3 With one exception. BoSy's DQBF procedure is fully symbolic but does not work 
as well as the default QBF procedure [12]. 
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input x 


y 4 
I 
ATO bj-f(aa;) 
i 


output y 


Fig. 1. A strict asynchronous computation for f. Values of x at non-reading points are 
shown as dotted. The y-value is constant between writing points, illustrated by a solid 
rectangle. Blocks are shown as dashed rectangles. 


ends just before the i + l'st writing point, w(i4- 1). The schedule ensures the i'th 
block includes the ?'th reading point, r(i), associated with the input-output value 
(xi, yi). As the value of y changes only at writing points, y; is constant in the ith 
block. Thus, the ith block follows the pattern (L, y:)* (xi, yi)(L, yi)", where L 
denotes an arbitrary choice of z-value. Figure illustrates a strict asynchronous 
computation and its decomposition into blocks. 


Expansions. The set of expansions of sequence ô = (xo, yo)(z1, Y1)... consists 
of all sequences obtained by simultaneously replacing each (z;,y;) in 6 by a 
block with the pattern (L,y;)*(z;,y;)(L,y;)*. Formally, given sequences 6 = 
(zo, Yo) (21, Y1)... and o = (Zo, Uo) (31, Y1) .. ., Ó expands to o, denoted as 6 expo, 
if there exists an asynchronous schedule (f, w) for which o is an execution that 
is a block pattern of ô, i.e., for all i, z; = z;(;; and y; = Ye) and for all j, 
w(t) < j < W(t +1) it is the case that y; = Yaa) The inverse relation (read as 
contracts to) is denoted by exp ~t. Figure 2 shows the synchronous computation 
that contracts the computation shown in Fig. 1. 


Relational Operators. For a relation R, the modal operators (R) and [R] are 
defined as follows. For any set S, 


u E€ (R)S = (dv: uRv ^ v € S) u € [R]S = (Vo: uRv > v € S) 


By definition, the operators are negation duals, i.e., 2(R)(^5) = [R](S) for any 
R and any S. For an LTL formula y and a relation R over infinite sequences, we 
let (R)g abbreviate (R)(L(~)), and similarly, let [R] abbreviate [R](L(y)). 


Galois Connections. Given partial orders (A, <4) and (B, <p), a pair of func- 
tions g : A — B and h : B — A form a Galois connection if, for all a € A,b € B: 
g(a) «p b is equivalent to a <4 h(b). From the definitions, it is clear that the 
operators ((R~+),[R]) form a Galois connection over the partial orders defined 
by the subset relation. I.e., for any sets S and T: (R~')S C T iff, S C [R]T. 

We first establish that the asynchronous executions of f are precisely the 
synchronous executions of f under an inverse expansion. 


Theorem 1. For an implementation f, asynch(f) = ( exp"! )synch( f). 
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O=ry 1=r, 2=r, 
input x | | 
a ay ay 
Em 
output y Í 
O=w, 1=w, 2-w, 


Fig. 2. The contracted synchronous (Moore) computation 


Proof. (ping) Let o be an execution in asynch(f), generated for some schedule 
(r,w). For any k, consider the k’th block of c. This is the set of positions from 
w(k) to w(k + 1) — 1, which includes the k’th reading point r(k), say with the 
value (£k, yx). Then the block follows the pattern (L, yx)* (zx, Yk)(L, yk)”. So o 
is an expansion of the sequence 6 = (xo, yo)(z1, y1) .... By the definition of an 
asynchronous execution, the value yg,; = f(£o,..., £k). This is precisely the 
requirement for ô to be a synchronous execution of f. Hence, we have that there 
is a 6 such that ó expo and 6 € synch( f). Therefore, ø € ( exp"! )synch( f). 
(pong) Let ø be in (exp! )synch(f). By definition, there is a synch( f) exe- 
cution ô = (zo, yo)(z1, 1) -.. such that ó expo. As 6 is a synchronous execution 
of f, the value yzi1 = f(zo,z1,..., £k), for all k. Then ø is an asynchronous 
execution of f under the schedule where the k-th reading point is the point that 
the k’th entry, (£k, yx), from 6 is mapped to in c, and the (k + 1)-th writing 
point is the first point of the (k + 1)'st block in the expansion. 


We now use the Galois connection to show how asynchronous synthesis can 
be reduced to an equivalent synchronous synthesis task. Consider a property y 
that must hold asynchronously for an implementation f. 


Theorem 2. Let f be an implementation function, and p a property. Then 
asynch(f) C L(y) if, and only if, synch(f) C [exp]. 
Proof. From Theorem 1, asynch(f) C L(y) holds iff (exp~')synch(f) C L(y) 


does. By the Galois connection, this is equivalent to synch( f) € [ exp ]y. 


3.2 The Pnueli-Rosner Closure 


We refer to the property [exp]y as the Pnueli-Rosner closure of p, in honor 
of their pioneering work on this problem, and denote it by PR(q). This has 
interesting mathematical properties, which are useful in practice. 


Theorem 3. PR(v) = [exp] has the following properties. 


1. (Closure) PR is monotonic and a downward closure, i.e., PR(p) € L(y) 
2. (Conjunctivity) PR is conjunctive, i.e., PR(A; pi) = (^; PR(vi) 
3. (Safety Preservation) If p is a safety property, so is PR(q) 
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The closure property relies on the reflexivity and transitivity of exp, and that 
[R] is monotonic for every R. Conjunctivity follows from the conjunctivity of [R] 
for any R. Safety preservation is based on the Alpern-Schneider [4] formulation 
of safety over infinite words. Proofs are in the full version of the paper. 

Conjunctivity is exploited by the tools Acacia+ [8] and Unbeast [11] to opti- 
mize the synchronous synthesis procedure. The Unbeast tool also separates out 
safety from non-safety sub-properties to optimize the synthesis procedure. Thus, 
if a specification y has the form y; ^ Ye, where qv is a safety property, then 
PR(yg) = PR(q1) N PR(ga) also denotes the intersection of the safety property 
PR(y1) with another property. 


3.3 The Closure Automaton Construction 


By negation duality, PR(w~) equals ^(exp)(—). We use this property to reduce 
asynchronous to synchronous synthesis, as follows. 


1. Construct a non-deterministic Büchi automaton A for 7y, 

2. Transform A to a non-deterministic Büchi automaton B for the negated 
Pnueli-Rosner closure of o, i.e., the language of B is (exp) Z(A) = (exp)(^9), 

3. Consider the structure of B as that of a universal co-Büchi automaton, which 
has language =£(B), 

4. Synthesize an implementation f in the synchronous model which satisfies 
o£(B) = (exp) L(A) = >( exp) (=p) = [exp]e = PR(y). 


The new step is the second one, which constructs B from A; the others use 
standard constructions and tools. This construction is as follows. 


— The states and alphabet of B are the states and alphabet of A. 

— The transitions of B are determined by a saturation procedure. For every 
pair of states q, q^, and letter (x, y), let IT(q, (x, y), q’) be the set of paths in A 
from q to q' where the sequence of letters on the path matches the expansion 
pattern (L, y)*(z, y)(-L; y)*. The transition (q, (z, y), q') is in B if, and only 
if, this set is non-empty, 

— If some path in //(q, (z, y), q') passes through a green (accepting) state of A, 
the transition (q, (z, y), a") in B is colored “green” and that path is assigned 
as the witness to the transition in B. On the other hand, if none of the paths 
in II(q, (z, y), q') pass through a green state, this transition is not colored in 
B, and one of the paths in the set is chosen as the witness for this transition, 

— The automaton B inherits the accepting (“green”) states of A and it may 
have, in addition, green transitions introduced as defined above, 

— A sequence is accepted by B if there is a run of B on the sequence such 
that either there are infinitely many green states, or infinitely many green 
transitions on that run. 


We establish that £(B) = (exp)£(A) through the following two lemmas. 


Lemma 1. (exp)Z(A) € £(D). 
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Proof. Let 6 = (xo, yo)(z1, yi)... be a sequence in (exp)£(A). By definition, 
there exists a sequence o in L(A) such that óexpo. The expansion c follows 
the pattern [(1, yo)" (zo, yo)(L, Yo) *][(1, y1)" (x1, 9i) Co y1)"] --- where [...] are 
used merely to indicate the boundaries of a block. An accepting run of A on c 
has the form qgo[(L, yo)" (xo; yo) CL, yo)*}ai[(L, yi) * (x1, 1) (-L, 91)*]a2 -.-, where 
the states on the run inside a block have been elided. By the definition of B, 
the segment qo(.L, yo)" (zo, yo) -L, yo)*q1 induces a transition from qo to qı in B 
on the letter (zo, yo). Similarly, the following segment induces a transition from 
qi to qo on letter (z1,y1), and so forth. These transitions together form a run 
go(Xo, yo)qm (x1, y1)qe . .. of B on ô. 

If one of the {q;} is green and appears infinitely often on the run on c, the 
induced run on ô is accepting. Otherwise, as the run on ø is accepting, some green 
state of A occurs in the interior of infinitely many segments of that run. The 
transitions of B induced by those segments must be green, so the corresponding 
run on 6 has infinitely many green edges, and is accepting for B. 


Lemma 2. £(B) € (exp) £(A). 


Proof. Let ô be accepted by B. We show that there is c such that ô expo and o 
is accepted by A. Let ó have the form (xo, yo)(z1, y1) ...,. Denote the accepting 
run of B on ô by r = qo(zo,yo)qi(z1, 1) -... From the construction of B, the 
transition from qo to qı on (xo, yo) has an associated witness path through A 
from qo to qı, which follows the expansion pattern (.L, yo)" (xo, yo)(L, yo)* on 
its edge labels. Stitching together the witness paths for each transition of r, we 
obtain both a sequence o that is an expansion of 6 and a run r' of A on c. 

As r is accepting for D, it must enter infinitely often either a green state or a 
green edge. If it enters a green state infinitely often, that state appears infinitely 
often on r'. If r enters a green edge infinitely often, the witness path for that 
edge contains a green state of A, say q; as this path is repeated infinitely often 
on c, q appears infinitely often on r’. In either case, a green state of A appears 
infinitely often on r’, which is therefore, an accepting run of A on ø. 


Automaton B can be placed in standard form by converting its green edges 
to green states as follows, forming a new automaton, B. Form a green copy of 
the state space, i.e., for each state q, form a green variant, G(q), which is marked 
as an accepting state. Set up transitions as follows. If (q,a, q') is an original non- 
green transition, then (q,a,q') and (G(q),a,q') are new transitions. If (q,a, q^) 
is an original green transition, then (q,a, G(g')) and (G(q),a, G(q')) are new 
transitions. This at most doubles the size of the automaton. It is straightforward 
to establish that £(B) = £(B). 


3.4 Symbolic Construction 


'The symbolic construction of B closely follows the definitions above. It is easily 
implemented with BDDs representing predicates on the input and output vari- 
ables x and y. The crucial step is to use fixpoints to formulate the existence of 
paths in the set // used in the definition of B. These definitions are similar to 
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the fixpoint definition of the CTL modality EF. We use A(q, (x, y), q’) to denote 
the predicate on (x, y) describing the transition from q to q’ in automaton A. 


Fired Don't-Care Path. Let EfixedY (q, y, q') hold if there is a path of length 0 
or more from q to q' in A where the value of y is fixed. This is the least fixpoint 
(in Z) of the following implications: 


- (d — 4) > Zí(q, y, qd), and 
- (Əx,r : Alq, (x,y); r) ^ Z(r.uy,q)) > Z(q,y, q) 


The predicate A^ (q, y, r) = (Ar: A(q, (x, y), r)) is pre-computed. Then, the least 
fixpoint is computed iteratively as follows. 


EfixedY°(q,y,q') = (q = q^) 
Efixed Y^! (q, y, q') = EfixedY (q, y, q') V (3r : A+ (q, y, r) ^ EfixedY'(r, y, q^) 


Let predicate green ,(r) be true for an accepting state r of A. The predicate 
Efixedgreen(q, y, q’) holds if there is a fixed y-path from q to q' where one of the 
states on it is green: 


Efixedgreen(q, y, q') = (3r : EfixedY(q, y, r) ^ green 4(r) ^ EfixedY (r, y, q^)) 


Paths and Green Paths. Let Epath(q, (x, y), q') hold if there is a path following 
the block pattern (L, y)*(z, y) CL, y)* from q to q' in A. Then, 


Epath(q, (v, y), q) = Gir, r' : EfixedY(q, y, r) ^ A(r, (x, y), r’) ^ EfixedY(r’, y, 9')) 


Similarly, let Egreenpath(q, (z, y), q') hold if there is a path following the block 
pattern (L, y)*(z, y) CL, y)* from q to q' in A, with an intermediate green state. 


Egreenpath(q, (x, y), q') = 
dr, r' : Efixedgreen(q, y, r) A A(r, (x,y), r^) ^ EfixedY(r', y, q’)) V 
3r, r! : EfixedY(q, y,r) ^ A(r, (x,y), r') ^ Efixedgreen(r’, y, q^)) 


^u 


~ 
J 


State Space of Ê. The state space of Ê is formed by pairs (q, g), where q is a 
state of A and g is a Boolean indicating whether it is a new green state. The 
accepting condition green (q, g) of B is given by green 4 (q) V g. 


Initial States. The initial predicate Ig(q,g) is Ia(q) ^ ^g, where I4(q) is true 
for initial states of the input automata A. 


Transition Relation of B. The transition relation B((q, g), (£, y), (q', )) is 


É((a, 9). (v. v); (q, g')) = Epath(q, (x, y), q') ^ (g = Egreenpath(q, (x, y), q^)) 
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4 Implementation and Experiments 


The PR algorithm has been implemented in a framework called BAS (Bounded 
Asynchronous Synthesis). It uses the LTL-to-automaton converter LTL3BA [3, 
6], and follows the modular method, connecting to either of two solvers, BoSy [2, 
12] and Acacia+ [1,8] to solve the synchronous realizability of PR(q). The PR 
construction is implemented in about 1200 lines of OCaml, using an external 
BDD library. (The core construction requires only about 400 lines of code.) 
For an LTL specification y, the BAS workflow for asynchronous synthesis is as 
follows: 


1. Check whether o is synchronously realizable; if not, return UNREALIZABLE, 
2. Construct Büchi automata A for ^o, and A for P, 
3. Concurrently 
(a) Construct PR(y) from A and check whether it is synchronously realizable; 
if so, return REALIZABLE and synthesize the implementation. 
(b) Construct PR(^) from A and check whether it is synchronously realiz- 
able for the environment; if so, return UNREALIZABLE. 
Upon termination of any, terminate the other execution as well. 


The synchronous synthesis tools successively increase a bound until a limit (com- 
puted based on automaton structure) is reached. Thus, in theory, only the check 
in step 3(a) is needed. However, the checks in steps 1 and 3(b) may allow the 
tool to terminate early (before reaching the limit bound), if a winning strategy 
for the environment can be discovered. 

To evaluate BAS we consider the list of examples presented in Table 1. The 
reported experiments were performed on a VM configured to have 8 CPU cores 
at 2.4GHz, 8GB RAM, running 64-bit Linux. The running times are reported in 
milliseconds. For each specification (presented in the second column) we report 
whether it is asynchronously realizable (third column), the time for the PR con- 
struction (our contribution), and the time for checking whether the specification 
is realizable using BoSy and Acacia+ solvers (resp., fifth and sixth columns). 

The first set of examples (Specifications 1-11) list specifications discussed in 
this paper and in related works. As parameterized example we consider 2 vari- 
ants of arbiter specifications. The arbiter has n inputs in which clients request 
permissions, and n outputs in which the clients are granted permissions. In both 
variants of the arbiter example, no two grants are allowed to be set simulta- 
neously. The first arbiter example (Specification 12) requires that whenever an 
input request r; is set, the corresponding output grant g; must eventually be set. 
The second variant (Specification 13) also requires that a grant g; is set only if 
request r; is set as well. T'hat is, in order for a client to be granted a permission, 
its corresponding request must be constantly set. Since the asynchronous case 
cannot observe the request in between read events, this variant of the arbiter is 
not realizable. The results are shown for n = 2,4,6. Note that the only compara- 
ble experimental evaluation is given in [18], where they report that asynchronous 
synthesis of the first arbiter example (Specification 12) takes over 8 h. 
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Table 1. BAS asynchronous synthesis runtime evaluation (times in milliseconds). We 
let BoSy run upto 2h, and Acacia+ upto 1000 iterations. “Na” denotes cases where 
the executions did not find a winning strategy within these boundaries. 


Specification Asyn. PR Asyn. synthesis 
Realizable? constr.|BoSy | Acacia+ 
1 (x = y) False 8 972 |30 
2 990z = 90y False 9 Na Na 
3 |OLIx > 90y True 8 899 |Na 
4 90y > 00a True 7 994 |Na 
5 (0z v OUnz) > oU = O0y True 13 1004 |Na 
6 ux (~r) U (~y)) > oUlz = O0y True 10 Na |Na 
7 O9 (x Ay) => (09y ^ DO) True 9 1053 |30 
8 O (x Vy) => (09y ^ L1Ó v) True 9 995 |40 
9 O9 (£) = (Doy ^UOo-v) True 8 934 |30 
10/0 (x => Oy) True 8 960 |30 
11/0 (x > Oy) ^ O (29U x) False 10 1058 |Na 
Variants of parameterized arbiter (results shown are for n = 2; 4; 6) 
12| Azz O Ogi V 795) ^ True 1l; |854; |Na; 
Aia B (ri > 0g) 13; |1146; |Na; 
75 4965 |Na 
13|A, j (agi V 793) ^ False 17; 1129; |Na; 
iib (ri 09g) ^ Nim O (gi > ri) 3124; |362K;|Na; 
2024K |Na Na 


The second specification y is the one discussed in Sect. 2. It is surprisingly 
difficult to solve. Both y and its negation are asynchronously unrealizable. More- 
over, y is synchronously realizable. Thus, the early detection tests (steps 1 and 
3(b)) failed to discover a winning strategy for the environment; the bounded 
synthesis tools increase the considered bound monotonically without converging 
to an answer in a reasonable amount of time. This example highlights the need 
for better tests for unrealizability. The results in the following section provide 
simple QBF tests of unrealizability for subclasses of LTL. 


5 Efficiently Solvable Subclasses of LTL 


The high complexity of direct LTL (synchronous) synthesis has encouraged the 
search for general procedures that work well in practice, such as Safraless and 
bounded synthesis [24,35]. Another useful direction has been to identify frag- 
ments of LTL with efficient synthesis algorithms [5]. Among the most notewor- 
thy is the GR(1) subclass, for which there is an efficient, symbolic synthesis 
procedure ([28]). We explore this direction for asynchronous synthesis. Surpris- 
ingly, we show that synthesis for certain fragments of LTL can be reduced to 
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Boolean reasoning over properties in QBF. The results cover several types of 
GR(1) formulae, although the question of a reduction for all of GR(1) is open. 

The QBF formulae that arise have the form dyVz.p(x,y), where x and y 
are disjoint sets of variables, and p is a propositional formula over x,y. An 
assignment y = b for which Vz.p(z,b) holds is called a witness to the formula. 
The first such reduction is for the property D9 P. 


Theorem 4. p =OP is asynchronously realizable iff JyVxP is True. 


Proof. (ping) Let b be a witness to JyVz.P. The function that constantly outputs 
y = b satisfies y for any asynchronous schedule. 

(pong) Let f be a candidate implementation function and suppose that 
Vy3x(—P) holds. Fix any schedule. For every value y = b that function f outputs 
at a writing point, there exists an input value x = a such that —P(a, 5) holds. 
Thus, the environment, by issuing z = a in the interval from the current writing 
point (with y = b) up to the next one, can ensure that ^P holds throughout the 
execution. Thus the specification y = O Q P does not hold on this execution. 


The result in Theorem 4 applies to asynchronous synthesis, but does not 
apply to synchronous synthesis. For example, the property O Q (x = y) is asyn- 
chronously unrealizable, as 3yVx(x = y) is False. On the other hand, it is syn- 
chronously realizable with a Mealy machine that sets y to x at each point. 

Theorem 4 extends easily to conjunction and disjunction of 1 properties. 


m 


Theorem 5. Specification p = M; 9 10 P; is asynchronously realizable iff 
JyVx.(Vi-g Pi) holds. Additionally, specification p = A;-9L10 P; is asyn- 
chronously realizable iff for all i € {0,1...m}, 3yVz.P; holds. 


Proof. The first claim follows directly from the identity Meg OP, = 
© (Vio Pi) and Theorem 4. 

For the second, for each i, let y = b; be an assignment such that Vx.P; (a, bi) 
holds. The function that generates sequence 09,5;,...0,,, ad infinitum, is an 
asynchronous implementation of Ai Ó P;. On the other hand, suppose that 
for some i, Vy3x- P; holds, then following the construction from Theorem 4, one 
can define an execution where P is always False. 


Theorem 6. y = OLI P is asynchronously realizable iff JyVx.P is True. 


The proof is similar to that for Theorem 4. Theorem 6 also extends to conjunc- 
tions and disjunctions of © properties, by arguments similar to those for The- 
orem 5. Namely, A7: 9 9 L1 P; is asynchronously realizable iff 3yVr(A7 y Pi) is 
True, and, V7. 9 UI P; is asynchronously realizable iff for some i € (0,1,... m], 
JyVz.P; is True. Theorems 4-6 apply to non-atomic reads and writes of multiple 
input and output variables. Proofs are in the full version of the paper. 

We now consider a more general type of GR(1) formula. The strict semantic 
of GR(1) formula OS. ALIO P. => OS, ^UIO Q is defined to be O(AS, 
Ss) ^ (LIS, ALIO P => OOQ) - ie., Ss is required to hold so long as Se has 
always held in the past; and if Se holds always and P holds infinitely often, 
then Q holds infinitely often. This is the interpretation supported by GR(1) 
synchronous synthesis tools. 
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Theorem 7. The strict semantics of GR(1) specification OS. AL1O P > OS;A 
OQ is asynchronously realizable iff 3yvx.(S. — (S, ^ (P => Q))) is True. 


Proof. (ping) If y = b is a witness to JyVr.(S, > (S, ^ (P = Q))), let f bea 
function that always generates b. Suppose Se holds up to point 2, then as y = b, 
regardless of the z-value, Ss holds at point i. This shows that the first part of 
the specification holds. For the second, suppose that Se holds always and P is 
true infinitely often. Then, by choice of y = b, (P = Q) holds always, thus Q 
holds infinitely often as well. 

(pong) To prove the other side of the implication, we proceed as in Theo- 
rem 4. Let f be a candidate implementation. Fix a schedule, and suppose that 
Vyde.(S. ^ (59, V a(P = Q))) holds. Then for every step of the execution 
and for every value y — b that function f outputs at a writing point, there exists 
a value x = a which the environment can choose from that writing point to the 
next such that S.(a, b) is true, and one of S,(a,b) or (P = Q)(a,b) is false at 
every point in that interval. 

On this execution, Se holds throughout. If Ss is false at some point, this 
violates the first part of the specification. If not, then (P — Q) must be false 
everywhere; i.e., at every point P is true but Q is false. Thus, Se holds everywhere 
and P holds infinitely often but Q does not hold infinitely often, violating the 
second part of the specification. 


'heorem 7 applies to atomic reads and writes, showing that asynchronous 
synthesis of GR(1) specification can be reduced to Boolean reasoning over prop- 
erties in QBF. For non-atomic reads and writes, safety in asynchronous systems 
is more nuanced, since there is a delay between the write points of the first and 
last outputs in each round. This is discussed in the full version of the paper. 
This proof strategy does not generalize easily to the full GR(1) format, where 
more than one 110 property can appear on either side of the implication. 

These results establish that the asynchronous synthesis problem for such 
specifications is easily solvable-more easily than in the synchronous setting, sur- 
prisingly avoiding entirely the need for automaton constructions and bounded 
synthesis. From another, equally valuable, point of view, the results show that 
such types of specifications may be of limited interest for automated synthesis, 
as solvable cases have very simple solutions. 


6 Conclusions and Related Work 


'This work tackles the task of asynchronous synthesis from temporal specifica- 
tions. The main results are a new symbolic automaton construction for gen- 
eral temporal properties, and the reduction of the synthesis question for several 
classes of specifications to QBF. These are mathematically interesting, being 
substantial simplifications of prior methods. Moreover, they make it feasible to 
implement an asynchronous synthesis tool following the modular process sug- 
gested by Pnueli and Rosner in 1989, by reducing asynchronous synthesis to a 
synchronous synthesis question. To the best of our knowledge, this is the first 
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such tool. The prototype, which builds on tools for synchronous synthesis, is able 
to quickly synthesize asynchronous programs for several interesting properties. 
There are, undoubtedly, several challenges, one of which is the quick detection 
of unrealizable specifications. 

Our work builds upon several earlier results, which we discuss here. The 
synthesis question for temporal properties originates from a question posed by 
Church in the 1950s (see [37]). The problem of synthesizing a synchronous reac- 
tive system from a linear temporal specification was formulated and studied by 
Pnueli and Rosner [31], who gave a solution based on non-emptiness of tree 
automata. There has been much progress on the synchronous synthesis question 
since. Key developments include the discovery of efficient symbolic (BDD-based) 
solutions for the GR(1) class [7,28], the invention of “Safraless” procedures [24], 
the application of these ideas for bounded synthesis [15,35], and their implemen- 
tation in a number of tools, e.g. [8, 10,11,13,20,34]. These have been applied in 
many settings (cf. [9,23,25—27]). 

'The problem of synthesizing asynchronous programs was also formulated and 
studied by Pnueli and Rosner [32] but has proved to be much more challenging, 
with only limited progress. The original Pnueli-Rosner constructions are complex 
and were not implemented. Work by Klein, Piterman and Pnueli, nearly 20 years 
later [22], shows tractability for some GR(1) specifications. However, the class 
of specifications that can be so handled is characterized by semantic constraints 
such as stuttering-closure and memoryless-ness, which are difficult to recognize. 

Finkbeiner and Schewe [18,35] present an alternative method, based on 
bounded synthesis, that applies to all LTL properties: it encodes the existence 
of a deductive proof for a bounded program into SAT/SMT constraints. How- 
ever, the encoding represents inputs and outputs explicitly and is, therefore, 
exponential in the number of input and output bits. The exponential blowup 
has practical consequences: an asynchronous arbiter specification requires over 
8h to synthesize [18]; the same specification can be synthesized by our method 
in seconds. (Note, however, that the method in [18] is not specialized to asyn- 
chronous synthesis, and this difference may not be solely due to the explicit 
state representation, as the specification has only 4 bits.) Recent work gives an 
alternative encoding of synchronous bounded synthesis into QBF constraints, 
retaining input and output bits in symbolic form [12]. We believe that a similar 
encoding applies to asynchronous bounded synthesis as well, this is a topic for 
future work. 

Pnueli and Rosner's model of interface communication is not the only choice. 
Other models for asynchrony could, for instance, be based on CCS/CSP-style 
rendezvous communication at the interface, or permit shared read-write variables 
with atomic lock/unlock actions. Petri net game models have also been suggested 
for distributed synthesis [16]. An orthogonal direction is to weaken the adversar- 
ial power of the environment through a probabilistic model which can be used to 
constrain unlikely, highly adversarial input patterns to have probability 0, thus 
turning the synthesis problem into one where programs satisfy their specifica- 
tions with high probability. (The synthesis of multiple processes is known to be 
undecidable in most cases [17,33].) 
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In the broader context of fully automatic program synthesis, there are various 


approaches to the synthesis of single-threaded, terminating programs from for- 
mal pre- and post-condition specifications and from examples, using type infor- 
mation and other techniques to prune the search space. (We will not attempt to 
survey this large field, some examples are [14,19,36].) An intriguing question is 
to investigate how the techniques developed in these distinct lines of work can be 
fruitfully combined to aid the development of asynchronous, reactive software. 
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Abstract. Automatic program synthesis promises to increase the pro- 
ductivity of programmers and end-users of computing devices by 
automating tedious and error-prone tasks. Despite the practical suc- 
cesses of program synthesis, we still do not have systematic frameworks 
to synthesize programs that are “good” according to certain metrics— 
e.g., produce programs of reasonable sizes or with good runtime—and 
to understand when synthesis can result in such good programs. In this 
paper, we propose QSYGUS, a unifying framework for describing syntax- 
guided synthesis problems with quantitative objectives over the syntax of 
the synthesized programs. QSYGUS builds on weighted (tree) grammars, 
a clean and foundational formalism that provides flexible support for 
different quantitative objectives, useful closure properties, and practical 
decision procedures. We then present an algorithm for solving QSvGuS. 
Our algorithm leverages closure properties of weighted grammars to gen- 
erate intermediate problems that can be solved using non-quantitative 
SYGUS solvers. Finally, we implement our algorithm in a tool, QUASI, 
and evaluate it on 26 quantitative extensions of existing SYGUS bench- 
marks. QUASI can synthesize optimal solutions in 15/26 benchmarks 
with times comparable to those needed to find an arbitrary solution. 


1 Introduction 


The goal of program synthesis is to find a program in some search space that 
meets a specification—e.g., a set of examples or a logical formula. Recently, 
a large family of synthesis problems has been unified into a framework called 
syntax-guided synthesis (SyGuS). A SyGuS problem is specified by a context- 
free grammar describing the search space of programs, and a logical formula 
describing the specification. Many synthesizers now support this format [2] and 
annually compete in synthesis competitions [4]. Thanks to these competitions, 
these solvers are now quite mature and are finding wide application [14]. 

While the logical specification mechanism provided by SYGUS is powerful, 
it can only capture the functional requirements of the synthesis problem—e.g., 
the program should perform correctly on a given set of input/output examples. 
When multiple possible programs can satisfy the specification, SYGUS does not 
provide a way to prefer one to the other—e.g., one cannot ask a solver to return 
the program with the fewest if-statements. As a consequence, existing synthesis 
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tools do not provide guarantees about what solution is returned if multiple ones 
exist. While a few synthesizers have attempted to include some form of specifi- 
cation to express this kind of quantitative intents [7,15,16,19], these approaches 
are domain-specific, do not apply to SYGUS problems, and do not provide a 
simple and flexible specification mechanism. The lack of a formal treatment 
of quantitative requirements stands in the way of designing synthesizers that 
can take advantage of quantitative objectives to perform more efficient forms of 
synthesis. 

In this paper, we propose QSvGvS, a unifying framework for describing 
syntax-guided synthesis problems with quantitative objectives over the syntax 
of the synthesized programs—e.g., find the most likely program with respect to 
a given probability distribution—and present an algorithm for solving synthesis 
problems expressed in this framework. We focus on syntactic objectives because 
they are the most common ones in practical applications of program synthesis. 
For example, in programming by examples it is desirable to produce small pro- 
grams with fewer constants because these programs are more likely to generalize 
to examples outside of the specification [13]. QSyGuS extends SyGUS in two 
ways. First, in QSYGUS the search space is represented using weighted gram- 
mars, which augment context-free grammars with the ability to assign weights 
to programs. Second, QSvGvS allows the user to specify constraints over the 
weight of the program, including optimization objectives—e.g., find the program 
with the fewest if-statements and with the lowest depth. 

QSyYGUS is a natural, general, and flexible formalism and is grounded in the 
well-studied theory of weighted grammars. We leverage this theory and design 
an algorithm for solving QSYGUS problems using closure properties of weighted 
grammars. Given a QSYGUS problem, our algorithm generates a SYGUS prob- 
lem that can be delegated to existing SYGUS solvers. The algorithm then iter- 
atively refines the solution returned by the SYGUS solver to find an optimal 
one by further generating new SYGUS instances using weighted grammar oper- 
ations. We implement our algorithm in a tool, QUASI, and evaluate it on 26 
quantitative extensions of existing SYGUS benchmarks. QUASI can synthesize 
optimal solutions in 15/26 benchmarks with times comparable to those needed 
to find a solution that does not need to satisfy any quantitative objective. 


Contributions. In summary, our contributions are: 


— QSYGUS, a formal framework grounded in the theory of weighted grammars 
that can describe syntax-guided synthesis problems with quantitative objec- 
tives over the syntax of the synthesized programs (Sect. 3). 

— An algorithm for solving QSyGUS problems that leverages closure properties 
of weighted grammars and existing SYGUS solvers (Sect. 4). 

— QUASI, a tool for specifying and solving QSvGuS problems that interfaces 
with existing SYGUS solvers and a comprehensive evaluation of QUASI, which 
shows that QUASI can efficiently solve QSYGuS problems over different types 
of weights, including additive weights, probabilities, and combinations of mul- 
tiple weights (Sect. 5). 
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Start ::— Start + Start/(0, 1) BExpr ::= Start > Start 
| if(BExpr) then Start else Start/(1, 0) | 2BExpr 
| | y| Of 1 | BExpr A BExpr 


Fig. 1. Weighted grammar that assigns weight (wi, w2) € Nat x Nat to a program 
where wi is the number of if-statements and w» is the number of plus-statements. 


2 Illustrative Example 


In this section, we illustrate the main components of our framework using an 
example. We start with a Syntax-Guided Synthesis (SyGuS) problem in which 
no quantitative objective is provided. We recall that the goal of a SyGuS prob- 
lem is to synthesize a function f of a given type that is accepted by a context-free 
grammar G, and such that Vz.$(f, x) holds (for a given Boolean constraint 4). 

The following SYGUS problem asks to synthesize a function that is accepted 
by the following grammar and that computes the max of two numbers. 


Start ::=Start + Start | if( BExpr) then Start else Start | v | y | 0| 1 
BExpr ::=Start > Start | -BExpr | BExpr ^ BExpr 


'The semantic constraint is given by the following formula. 


def 


v(f) — Vx,y.f(z,y) Z v^ f(x,y) > yA (f(a, y) xV f(z,y) ^ v) 


'The following three programs are semantically equivalent, but syntactically 
different solutions. 


mazi(xz, y) = if(x > y) then x else y 
mazo(xz, y) = if(x > y) then (x + 0) else (y + 0) 
max3(x,y) = if(x > y) then x else (if(y > x) then y else x) 


AII solutions are correct, but the user might, for example, prefer the smallest 
one. However, SYGUS does not provide ways to specify this quantitative intent. 


Adding Weights. In our formalism, QSYGUS, we augment context-free gram- 
mars to assign weights to programs in the search space. Concretely, we adopt 
weighted grammars [10], a well-studied formalism with many desirable proper- 
ties. In a weighted grammar, each production is assigned a weight. For example, 
the weighted grammar shown in Fig. 1 extends the one from the previous SYGUS 
example to assign to each program p a pair of weights (w1, w2) where w is the 
number of if-statements and wy is the number of plus operators in p. In this case, 
the weights are pairs of integers and the weight of a grammar derivation is the 
pairwise sum of all the weights of the productions involved in the derivation— 
e.g., the sum of (w1, w2) and (wj, w3) is (wi + w1, w2 + w3). In the figure, we 
write /(w1, w2) to assign weight (wj, w5) to a production. We omit the weight for 
productions with cost (0,0). The functions mazı, maz and mazz have weights 
(1,0), (1,2), and (2,0) respectively. 
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Adding and Solving Quantitative Objectives. Once we have a way to assign 
weights to programs, QSyGuS allows the user to specify quantitative objec- 
tives over the weights of the productions—e.g., only allow solutions with fewer 
than 4 if-statements. In our example, we could require the solution to be minimal 
with respect to the number of if-statements, i.e., minimize the first component 
of the paired weight. With these constraints both max; and mazz would be con- 
sidered optimal solutions because there exists no solution with 0 if-statements. If 
we require the solution to also be minimal with respect to the second component 
of the paired weight, max, will be a possible optimal solution. 

Our tool QUASI can automatically discover solutions in both these cases. 
Let's consider the last minimization objective. In this case, QUASI first uses 
existing SYGUS solvers to synthesize an initial solution using the non-weighted 
version of the grammar. Let's say that the returned solution is, for example, 
mazs of weight (2, 0). QUASI uses this solution to build a new SYGUS instance 
that only accepts programs with at most one if-statement. Solving this SYGUS 
problem can, for example, result in the program mazz of weight (1,2), which 
will require our solver to build yet another SvGUuS instance. This approach is 
repeated and if it terminates, an optimal program is found. 


3 SyGuS with Quantitative Objectives 


In this section, we introduce our framework for defining syntax-guided synthesis 
problems with quantitative objectives over the syntax of the synthesized pro- 
grams. We first provide preliminary definitions for notions such as semirings 
(Sect.3.1) and weighted tree grammars (Sect. 3.2), and then use these notions 
to augment SYGUS problems with quantitative objectives (Sect. 3.3). 


3.1 Weights over Semirings 


We now define the universe of weights we will assign to programs. In gen- 
eral, weights are defined using monoids—i.e., sets equipped with an addition 
operator—but when a grammar is nondeterministic—i.e., it can produce the 
same term using multiple derivations—the same term might be assigned multi- 
ple weights. Hence, we choose to use semirings. Since we also care about opti- 
mization objectives, we assume all our semirings are equipped with a partial 
order. 


Definition 1 (Semiring). A (ordered) semiring is a pair (S, 4) where (i) S= 
(S,@,®,0,1) is an algebra consisting of a commutative monoid (S,®,0) and a 


monoid (S,&,1) such that & distributes over ©, 0 #1, and, for every x € S, 
x@0=0, (ii) XC S x S is a partial order over S. 


We often use the word semiring to refer to just the algebra S. 


Example 1. In this paper, we focus on semirings with the following algebras. 
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Boolean Bool = (B, V, ^,0,1). This semiring only contains the values true and 
false and is used to represent non-quantitative problems. 

Tropical Trop = (ZU {oo}, min, +,00,0). This semiring is the most common 
one and is used to assign additive weights—e.g., term sizes and term depth. 
In this case, we typically consider the order < S <, 

Probabilistic Prob = ([0,1],+,-,0,1). This semiring is used to assign probabil- 
ities to terms in a grammar. 


In our framework, we allow synthesis problems to have multiple objectives. 
Hence, we define a product operation to compose semirings. Intuitively, the fol- 
lowing operation composes algebras of semirings to create a pair and applies the 
operation of each algebra to the corresponding projections of the pair. Similarly, 
two orders can be composed to create an order over pairs of elements. We pro- 
pose two such compositions, one which assigns equal weights to the two orders 
and one which prefers one order over the other (Sorted). 


Definition 2 (Products). Given two algebras Sı = (S1, 91,81,01, 11) and 
S2 = (So, 2, 2,02, 12), the product algebra is the tuple Sı xs So = (Sı x 
$3,0,G, (01,02), (11, 12)) such that for every 21,22 € Sı and yi, y» € S2, we 
have (z1,y1) ® (x2, yo) = (x1 €1 22,1 €2 ya) and (21,41) & (22,y2) = (1&1 
22,91 Q2 y2). 

Given two partial orders 1C S1 x $1 and X2C S3 x S5, the Pareto product of 
the two orders is defined as the partial order <p= PAR(X1, 2) € (S1x $3) x ($1x 
S2) such that, for every z1,22 € $1 and yı, ya € $5, we have (£1, y1) Xp (22, y2) 
iff x1 41 x2 and yı Xo ya. 

Given two partial orders 41C $4 x Sı and S2C Sa x So, the Sorted product 
of the two orders is defined as the partial order 4,— SORT(<1, 32) C (Si x 
$3) x (S1 x S2) such that, for every z1,x9 € Sı and y1,y2 € So, we have 
(21,91) Xs (2, ye) iff £1 X1 2 or (x1 = 12 and yı 2 yz). 


Example 2. The weights in the grammar in Fig. 1 are from the product semiring 
Tropxs Trop. When using the Pareto partial orders, we have, for example, (1,0) < 
(2,0) and (1,0) < (1,2), but (1,2) is incomparable to (2,0). When using the 
Sorted product, we have, for example, (1,0) < (1,2) < (2,0). 


3.2 Weighted Tree Grammars 


Since SYGUvS defines search spaces using context-free grammars, we propose to 
extend this formalism with weights to assign costs to terms in the grammar. We 
focus our attention on a restricted class of context-free grammars called regular 
tree grammars—i.e., grammars generating regular tree languages—because, to 
our knowledge, the benchmarks appearing in the SvGUuS competition [3] and 
in practical applications of SYGUS operate over tree grammars. Moreover, it 
was recently shown that SyGuS problems that are undecidable for context-free 
grammars become decidable with weighted tree grammars [8]. 

Trees A ranked alphabet is a tuple (Z,rkx) where X is a finite set of symbol 
and rks : X — N associates a rank to each symbol. For every m > 0, the set 
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of all symbols in X with rank m is denoted by X"). In our examples, a ranked 
alphabet is specified by showing the set X and attaching the respective rank to 
every symbol as superscript—e.g., X = (-- (2, c0}. We use Ts to denote the set 
of all (ranked) trees over Z—i.e., Ty is the smallest set such that (i) X C Ty, 
(ii) ifo € X and t4,..., ty € Ts, then o(t1,--: tk) € Ty. In the following we 
assume a fixed ranked alphabet (2’,rky). 


Weighted Tree Grammars. 'Iree grammars are similar to word grammars but 
they generate ranked trees instead of words. Weighted tree grammars augment 
tree grammars by assigning weights from a semiring to trees. They do so by 
associating weights to productions in the grammar. Weighted grammars can, for 
example, compute the height of a tree, the number of occurrences of some node 
in the tree, or the probability of a tree with respect to some distribution In the 
following, we assume a fixed semiring (S, X) where S = (S, 9,8,0,1). 


Definition 3 (Weighted Tree Grammar). A weighted tree grammar 
(WTG) is a tuple G = (N,Z,P,u), where N is a set of non-terminal sym- 
bols with arity 0, Z is an axiom with Z € N, P is a set of production rules of 
the form A — B where A € N is a non-terminal and B is a tree of T(X U N), 
and u : P — S is a function assigning to each production a weight from the 
semiring. 


We can now define the semantics of a WTG as a function Wg : Ts > S, 
which assigns weights to trees. Intuitively, the weight of a tree is -sum of the 
weight of every possible derivation of that tree in a grammar and the weight of 
a derivation is the &-product of the weights of the productions appearing in the 
derivation. We use M S(6) = (X1,..., Xx) to denote the multi-set of all nonter- 
minals appearing in 8 and B[t4/ X1, . .., ty / Xx] to denote the result of simultane- 
ously substituting each X; with t; in 8. Given a derivation p = A — ( such that 
MS(B) = (X1,..., Xx), we assume that p is a symbol of arity k. A derivation d 
starting at non-terminal X is a tree of productions d € T(P) representing one 
possible way to derive a tree starting from X. The derivation has to be such that: 
(i) the root of d is a production of the form X — 6, (ii) for every node p = A — 8 
in d, if MS(B) = (X1,..., Xx), then, for every 1 <i € k, the i-th child of p is 
a production X; — ĝi. Given a derivation d with root p = X — f, such that 
MS(B) = (X4,..., Xx) and p has children subtrees di,...,d,, the tree gener- 
ated by d is recursively defined as tree(d) = B[tree(d1)/ X4, ... , tree(dy)/ X;]. 
We use DER(X, t) to denote the set of all derivations d starting at X, such that 
tree(d) = t. The weight Dw(d) of a derivation d is the &-product of the weights 
of the productions appearing in the derivation. Finally, the weight of a tree t is 
the -sum of the weights of all the derivations of t from the initial nonterminal 
Walt) = uaenm(z e DW(d). A weighted tree grammar is unambiguous iff, for 
every t € Ty, there exists at most one derivation—i.e., |DER(Z,t)| < 1. 

Weighted tree grammars generalize weighted tree automata. In particular, a 
weighted tree automaton (WTA) is a WTG in which every production is of the 
form A > e(T3,..., T4), where A € N, each T; € N, and c € X), Finally, a 
tree automaton (TA) is a WTA over the Boolean semiring—i.e., the TA accepts 
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all trees with some derivations yielding true. Similarly, a tree grammar (TG) is 
a WTG over the Boolean semiring. Given a TA (resp. TG) G, we use L(G) to 
denote the set of trees accepted by G—i.e., L(G) = {t | wa(t) = true}. 


Example 3. The weighted grammar in Fig. 1 operates over the semiring Trop x 
Trop, N = (Start, BExpr}, Z = Start, P contains 9 productions, and u assigns 
non-zero weights to two of them. 


Aside from being a natural formalism for assigning weights to trees, TGs 
and WTGs enjoy properties that make them a good choice for our model. First, 
WTGs (resp. TGs) are equi-expressive to WTAs (resp. TAs) and have logic 
characterizations [9-11]. Due to this reason, tree grammars are closed under 
Boolean operations and enjoy decidable equivalence [9]. Second, WTGs enjoy 
many closure and decidability properties—e.g., given two WTGs G4 and Go, 
we can compute the grammars G4 © G2 and Gi & Ga such that, for every f, 
Wa.ec.(f) E Wai (f) € Wa,(f) and Wa,ec.(f) = wa, (f) 8 Wea (f). This 
operation is convenient for building grammars over product semirings. 


3.3 QSyGuS 


In this section, we formally define QSYGuS, which extends SvGuS with quanti- 
tative objectives. In SYGUS a problem is specified with respect to a background 
theory 7—e.g., linear arithmetic—and the goal is to synthesize a function f that 
satisfies two constraints provided by the user. T'he first constraint describes a 
functional semantic property that f should satisfy and is given as a predicate 
vCf) = Va.o(f, x). The second constraint limits the search space S of f and is 
given as a set of expressions specified by a context-free grammar G defining a 
subset of all the terms in T. A solution to the SYGUS problem is an expression 
e in S such that the formula w(e) is valid. 

We augment such a framework in two ways. First, we replace context free 
grammars with WTGs, which we use to assign weights (from a given semiring) 
to terms. Second, we augment the problem formulation with constraints over 
the weight of the synthesized program—i.e., only consider programs of weight 
greater than 2—and optimization objectives over the same weight—i.e., find the 
solution of minimal weight. Weight constraints range over the grammar 


WC :=WCAWC|WCVWC|AWC|wxs|sxw|wrxs|s~<vw, 


where w is a special variable and s is an element of the semiring under consid- 
eration. Given a constraint w € WC, we write w(t) to denote the term obtained 
by replacing w with t in w. 


Definition 4 (QSvGuS). A QSyGuS problem is a tuple (T, (S, xX), v(f), G, 
w, OPT) where: 


- T is a background theory. 
- (S, <) is an ordered semiring defining the set of weights and their operations. 
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Algorithm 1.QSYGUS synthesis algorithm 
1: procedure QSvGUuS-soLvE(T, S, v, G, w, OPT) 


2: G’ — REDUCEGRAMMAR(G, w) > extract grammar satisfying w 
3: f* —SyGuS(T, v, G") > solve corresponding SYGUS problem 
4: if oPT = false then return f* 

5: while true do 

6: G' — REDUCEGRAMMAR(G’, w < We(f*)) 

7: f — SyGuS(T, v, G^) > Try to find better solution 
8: if f — 1 then return f* > Return the optimal solution 
9: -=f 


- G is a weighted tree grammar with weights over the semiring S and that only 
contains terms in T —i.e., L(G) C T. 

- (f) = Va.o(f,x) is a Boolean formula constraining the semantic behavior 
of the synthesized program f. 

- w € WC is a set of constraints over the weight w of the synthesized program. 

— OPT is a Boolean denoting whether the solution has to have minimal weight 


with respect to <. 


A solution to the QSvGUS problem is a term e such that e € L(G), w(e) is 
true, and w(We(e)) is true. If OPT is true, we also require that there is no g that 
satisfies the previous conditions and such that w(We(g)) < w(We(e)). 


A SyGusS problem is a QSYGUS problem without weight constraints—i.e., w = 
true and OPT = false. We denote such problems just as triples (T',v(f), G). 


Example 4. Consider the QSvGvS problem described in Sect.2. We already 
described all the components but w and OPT in the rest of this section. In this 
example, w = true and OPT = true because we want to synthesize the solution 
with minimal weight. 


4 Solving QSyGuS Problems via Grammar Reduction 


In this section, we present an algorithm for solving QSvGUuS problems (Algo- 
rithm 1), which works as follows. First, given a QSvGUS problem, we construct 
(under certain assumptions) a SYGUS problem for which the solution is guaran- 
teed to satisfy the weight constraints w (line 2) and use existing SYGUS solvers 
to find a solution to such a problem (line 3). If the QSyGuS problem requires 
minimization, our algorithm produces a new SyGUS instance to search for a 
solution that is better than the previously found one and tries to solve it (lines 
6-7). This procedure is repeated until an optimal solution is found (line 8). 


4.1 From QSyGuS to SyGuS 


The first step of our algorithm is to construct a SYGUS problem character- 
izing exactly all the solutions of the QSvGuS problem that satisfy the weight 
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constraints. Given a QSvGUuS problem P = (T, (S, X), v (f), G,w, OPT), we con- 
struct a SYGUS problem P’ = (T, v(f), G”) such that a function g is a solution 
to the SYGuS problem P’ iff g is a solution of P = (T, (S, <), v (f), G,w, false), 
where the optimization constraint has been dropped. We denote the grammar 
reduction operation as G” — REDUCEGRAMMAR(G,w). 


Base case. First we show how to solve the problem when w is an atomic formula— 
i.e. of the form w < s, s < w, w < s, or s < w. We start by showing how to solve 
the problem for w < s as the construction is identical for the other constraints. 

Concretely, we are given a WTG G = (N, Z, P, p) and we want to construct 
a TG G^, = (N', Z', P") such that t € L(G;) iff waG(t) < s. In general, it is not 
possible to perform this construction for arbitrary semirings and grammars. We 
first present our algorithm and then describe sufficient conditions under which 
we can ensure termination and correctness. 

The idea behind our construction is to introduce new nonterminals in the 
grammar G4, to keep track of the weight of the trees that can be produced 
from those nonterminals. For example, a nonterminal pair (X, s") will derive all 
trees derivable from X using a single derivation of weight s’. Therefore, the set 
of nonterminals N’ is a subset of N x S (plus an initial nonterminal Z’), where 
S is the universe of the WTG’s semiring. We construct our set of nonterminals 
N' starting from the leaf productions of G and then recursively explore other 
productions. At the same time we generate the set of productions P'. Formally, 
N' and P' are the smallest sets such that the following conditions hold. 


1. Z' € N' (the initial nonterminal). 

2. For every production p € P such that p = (A — 8) and 8 € Ty—ie., pisa 
leaf—and p(p) < s, then (A, u(p)) € N’ and ((A,u(p)) > 8) € P'. If A= Z, 
then Z’ — (A, u(p)) € P’. 

3. For every production p € P such that p = (A — 8), MS(8) = 
(Xis... Xk), CX1,81),..., (Xx, s&) € N’ (for some values s; € S), and 
u(p)&s1G...& sk = s', 3 < s, then (A,s) € N', and ((A,s') ^ 
B[CX3, 81)/ X1, ..., (Xi, 5x)/ X]) € P'. If A = Z, then Z' —^— (A,s) € P*. 


Example 5. We illustrate our construction using the grammar in Fig. 1 . Assume 
the weight constraint is w < (1,0) and the partial order is built using a 
Pareto product—i.e., we accept terms with 1 or less if-statements and no plus- 
statements. Our construction yields the following grammar. 


Z’ ::=(Start,1,0) | (Start,0,0) 
(Start,1,0) ::=if((BExpr,0,0)) then (Start,0,0) else (Start,0,0) | «| y| 0] 1 
(Start,0,0)::=a | y|O | 1 
(BExpr,0,0) ::=(Start,0,0) > (Start,0,0) | ~(BExpr,0,0) | (BExpr,0,0) ^ (BExpr,0,0) 


The construction of G2, only terminates for certain semirings and grammars, 
and only guarantees that individual derivations yield the correct weight—i.e., it 
does not account for the -sum of multiple derivations. 
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Example 6. The following WTG over Prob is ambiguous and, if we apply the 
grammar reduction algorithm for w :— w < 0.6, the resulting grammar will be 
empty. However, the tree 1 + 1 has weight 0.9 < 0.6 (0.9 > 0.6). 


Start ::=Start + Start/0.5 Expr ::=Expr + Expr/0.4 
|z|0|1]| Expr ja |O|1 


We now identify sufficient conditions under which the construction of G2, 
terminates and is sound. In particular, we start by restricting our attention 
to unambiguous W'TGs, which are the common ones in practice. We use 
WEIGHTS(G) = {s | p € PAu(p) = s) to denote the set of weights used by G and 
Ms, = (S', Q, 1) to denote the submonoid of S generated by wEIGHTS(G)—i.e., 
the set of all weights we can generate using & and WEIGHTS(G). 


Theorem 1. Given an unambiguous WTG G over a semiring S such that 
Msa = (9,8,1), and a weight s € S, the construction of Gz, terminates 
if the set (s' | $ < s ^w € S') is finite. Moreover, if the set of weights 
WEIGHTS(G) is monotonically increasing with respect to 4 —i.e. for every se S 
and s' € WEIGHTS(G), s < s & s' —then L(G-,) contains exactly every tree t 
such that We(t) < s. 


The theorem above also holds for other atomic constraints w < s, s < w, 
or s < w (for these last two, the direction of the monotonicity is reversed). 
Moreover, in certain cases, even if the construction may not terminate for, let's 
say s < w, it might terminate for the negated constraint w < s. In such a case, we 
can use the closure properties of regular tree grammars/automata to construct 
the reduced grammar for s < w as Gw = INTERSECT(G, COMPLEMENT(G;.,,)). 
The same idea can be applied to all atomic constraints. 

In practice, the restriction of Theorem 1 holds for grammars that operate 
over the Boolean and probabilistic semirings, and the tropical semiring only 
with positive weights. Theorem 1 never holds when S is the tropical semiring 
and the grammar contains negative weights. In general, one cannot construct 
the constrained grammar in this case. However, it is easy to modify our algo- 
rithm to work with grammars that do not contain loops—i.e., derivations from a 
nonterminal to a tree containing the same nonterminal—with negative weights. 

Intuitively, when the grammar contains no negative loops, we can find a con- 
stant SH such that any intermediate derivation with weight greater than s+ SH 
will never result in tree with weight smaller than s. We use this idea to modify 
the construction of gU e. G«, for Trop—as follows. First, this constant is 
bounded by ck”! where c is the absolute value of the smallest negative weight 
in the grammar, k is the largest number of nonterminals appearing in one gram- 
mar production, and n — |N| is the number of nonterminals. Second, in steps 2 
and 3 of the construction, a new nonterminal and the corresponding productions 
are produced if u(p) € s+ |SH| (previously u(p) € s). However, if A = Z in 
steps 2 and 3, we add a new production Z’ — (A, s’) only if s' < s. 

We now show when this construction terminates and return correct values. 
Since the tropical semiring combines multiple runs using the min operator, we 
can drop the requirement that the grammar has to be unambiguous. 
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Theorem 2. Given a WTG G over Trop and a weight s € Z, the construc- 
tion of GI? terminates if G contains no loop with cumulative negative weight. 


Moreover, GI? contains exactly every tree t such that Wa(t) < s. 


Composing semirings. We next discuss how Theorem 1 relates to product semir- 
ings. Given a grammar G = (N, Z, P, u) over a semiring S4 xg S2, we use GS 
to denote the grammar (N, Z, P, ui) in which the weight function outputs the 
corresponding projected weight—i.e., if u(p) = (s1, s2), then ju(p) = si. 

Let's first consider the case where the product semiring uses a Pareto partial 
order. In this case, if Theorem 1 holds for each grammar G8: and w; 4; Si, then 
it holds for G and (w1, w2) Xp (51,52). However, the other direction is not true. 
Theorem 3 proves this intuition and states that, in some sense, solving Pareto 
partial orders is easier than solving the individual partial orders. 


Theorem 3. Given an unambiguous WTG G over the semiring S = S1 xg S2 
with Pareto partial order 4,— PAR(<1, <2) and a weight s = (51,52) € S, 
if the constructions CRM and B terminate, then the construction of G4, 
terminates. 


When we move to Sorted partial order we cannot get an analogous theorem: if 
Theorem 1 holds for each grammar G8: and w; 4; Si, then it does not necessary 
hold for G and (w1, w2) <s (s1, $2). In particular, if the semiring S is infinite and 
there exists an s' < s1, there will be infinitely many elements (51, -) < (s1, 52). 
Using this observation, we devise a modified algorithm for reducing grammars 
with sorted objectives. First, we compute the grammars Co, 23 CUM and Ga " 


Second, we use WTG closure properties to compute G<, (51,52) as the union of 
G8:,. and INTERSECT(GSs , GS? 


A81 =s]? ul 


General formulas. We can now inductively construct the grammar accepting only 
terms satisfying all constraints in w. We again use the fact that tree grammars 
are closed under Boolean operations to compute intersections and unions and 
correctly characterize all conjunctions and unions appearing in the formulas. 


4.2 Finding an Optimal Solution 


If our QSYGuS problem does not require minimization—i.e., OPT = false—the 
technique presented in Sect.4.1 can be used to generate an equivalent SYGUS 
problem P’ = (T,v(f),G'), which can be solved using off-the-shelf SyGuS 
solvers. In this section, we show how to extend this technique to handle min- 
imization objectives. Our idea is to use SYGUS solvers to find a non-optimal 
solution for P' and then iteratively refine our grammar G’ to search for a better 
solution. This loop is illustrated in Algorithm 1 (lines 5-9). Given the initial solu- 
tion f* to P’ such that wg(f*) = s, we can construct a new grammar Gs and 
look for a solution with lower weight. If the SYGUuS solver we use is sound—it 
can find a solution if it exists—and complete—it can detect if a solution does 
not exist—Algorithm 1 terminates with an optimal solution. 

In general, the above conditions are too strict and in practice this implies 
that the algorithm will often not terminate. However, if the SYGUS solver is 
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sound, the Algorithm 1 will eventually find the optimal solution, but it will not 
be able to prove that no smaller one exists. In our experiments, we will show 
that this approach can yield better solutions than those given by vanilla SYGUS 
solvers even when Algorithm 1 does not terminate. 


5 Implementation and Evaluation 


First, We extended the SyGuS format with new syntax for expressing QSYGuS 
problems. Our format supports all semirings presented in Sect.3.1 as well as 
additional ones. The format also allows creating tuples of semirings using the 
product operation described in Sect. 3.1. We augment the original SYGUS syntax 
to support weights on grammar productions. Weight constraints are added using 
an SMT-like syntax. 

Second, we implemented Algorithm 1 in a tool called QUASI. QUASI already 
interfaces with three SYGUS solvers: CVC4 [6], ESolver [4], and EUSolver [5]. 
QUASI supports all the semirings allowed in our format and implements a library 
for tree automata/grammars and weighted tree automata/grammars operations, 
as well as several optimizations we did not discuss in the paper. In particular, 
QUASI often uses simple grammar reduction techniques to simplify the generated 
grammars, remove unnecessary productions, and consolidate equivalent ones. 

We evaluate QUASI through the following questions (experiments performed 
on an Intel Core i7 4.00 GHz CPU with 32 GB/RAM). 


Q1 Can QUASI solve quantitative variants of real SyGuS benchmarks? 
(Sect. 5.1) 

Q2 What is the overhead of synthesizing optimal solutions? (Sect.5.2) 

Q3 How do multiple iterations of Algorithm 1 affect the solution's weight? 
(Sect. 5.3) 

Q4 Can QUASI solve QSYGUS problems with multiple objectives? (Sect. 5.4) 


Benchmarks. We perform our evaluation on 26 quantitative extensions of exist- 
ing SYGUS competition benchmarks taken from 4 SyGuS benchmark tracks 
[4]: Hackers Delight, Integers, ICFP and Bitvector. 18 of our benchmarks only 
use a minimization objective over a single semiring (Table 1), while 8 use a min- 
imization objective (Pareto or Sorted) over a product semiring (Table2). We 
select SYGUS benchmarks using the following criteria: (i) the benchmark can 
be solved by either CVC4 [6] or ESolver [4], and (ii) the solution is not optimal 
according to some reasonable metric—e.g., size or number of if statements. 


5.1 Effectiveness of QSyGuS Solver 


We evaluate the effectiveness of QUASI on the 18 single-minimization-objective 
benchmarks. For each benchmark, we run QUASI using either CVC4 or ESolver 
as the backend SyGUS solver (we also evaluated QUASI using EUSolver [5], but, 
due to its poor performance, we do not report the results). The results are shown 
in Table1. The timeout for each iteration of Algorithm 1 is 10 min. 
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With CVC4, QUASI terminates with an optimal solution in 9/18 benchmarks, 
taking less than 5s (avg: 0.7s) to solve each sub-problem. In 3 of these cases, 
the initial solution is already optimal and the second iteration is used to prove 
optimality. With ESolver, QUASI terminates with an optimal solution in 8/18 
benchmarks, taking less than 7s (avg: 0.9s) to solve each sub-problem. In 1 
cases, it can find a better solution than the original one, but it cannot prove 
that the solution is optimal. Overall, by combining solvers, QUASI can find a 
better solution than the original SyGUuS solution given by one of the two solvers 
in 9/18 benchmarks. QUASI cannot improve the initial solution of the linear 
integer arithmetic benchmarks (array_search and LinExpr_eqiex). 

Both solvers timeout on large grammars. The grammars in Table 1 are 1 to 2 
order of magnitude larger than those in existing SyYGUS benchmarks (avg: 224 vs 
13 rules) and existing solvers have not yet been optimized for this parameter. In 
some cases, the solver times out for intermediate grammars that do not contain 
a solution, but that generate infinitely many terms. In general, existing SYGUS 
solvers cannot prove unsatisfiability for these types of problems. To answer Q1, 
QUASI can solve quantitative variants of 10/18 real SyGuS benchmarks. 


Table 1. Performance of QUASI. Time shows the sequence of times taken to solve 
individual iterations of Algorithm 1. Largest is the size of the largest SyGuS sub- 
problem. Grammar Size is the number of rules in the original grammar. 


Problem CVCA ESolver Grammar 
Time [sec] Largest | Time [sec] Largest | Size 
Trop | max ite(2,3) 0.14-0.1 42 0.1 42 13 
max ite(2,15) | 0.1+0.1 239 0.3 239 13 
max ite(3,15) |0.1+0.1+0.1| 238 OOM 238 13 
max.ite(10,15) |0.5--0.54-0.9| 226 OOM 226 13 
parity not 0.14- TO 301 26.9+TO 43 6 
max3_ite 0.1+TO 31 OOM — 14 
array.search. 3  0.1--TO 135 TO — 15 
array.search 5 0.1+TO 108 TO — 16 
hackers.5 0.1+0.1 27 0.1+0.1+0.1 35 13 
hackers_7 0.14-0.3 35 0.14-0.14-0.2 41 13 
hackers_17 0.1+0.7 41 2.8+3.0+1.0 62 13 
hackers_19 0.2+TO 174 TO — 13 
icfp. 7 0.24- TO 146 TO — 11 
LinExpr.eqiex 0.7+TO 1717 TO — 14 
Prob|hackers 2 prob |0.6--4.1--0.1| 95 0.84-0.14-0.2 154 13 
hackers. 5 prob | 0.1+0.9+0.1 96 0.1+0.2+0.1 154 13 
hackers_7_prob |0.1+TO 162 0.1+0.1+0.2 212 13 
hackers_17_prob | 0.1+TO 187 3.4+6.5+00M | 291 13 
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5.2 Solving Time for Different Iterations 


In this section, we evaluate the time required by each iteration of Algorithm 
1. Figure 2 shows the ratio of time taken by each iteration with respect to the 
initial non-quantitative SYGUS solving time. Some of the iterations shown in 
Fig. 1 do not appear in Fig. 2 since they resulted in no solution—i.e., the initial 
solution was minimal. CVCA is typically slower in subsequent iterations and can 
take up to 10 times the original solving time, while ESolver has comparable 
runtime to the initial run and is often faster. These numbers are largely due 
to how the two solvers work: CVCA is optimized to solve problems where the 
grammar imposes no restrictions on the structure of the solution, while ESolver 
performs enumerative search and takes advantage of more restrictive grammars. 
One interesting point is the parity not 


benchmark. ESolver takes 26.9s to 
find an initial solution. But, with a 
weight constraint w « 11, an solution 
can be found in 2.2s. CVC4 can find 
the initial solution with weight 11 in 
0.1s but cannot solve the next itera- 
tion. We tried using different solvers 
in different iterations of our algorithm 
and, in fact, found that, if we use 
CVCA to find an initial solution and 
then ESolver in subsequent iterations 
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Fig. 2. Solving time across iterations 


with restricted grammars we can fully solve this benchmark in a total of 2.3s 
which is much better than the time taken by a single solver. To answer Q2, 
with appropriate choices of solvers the overhead of synthesizing optimal 


solutions is minimal. 


5.3 Solution Weight Across Iterations 


In this section, we present how the 
weight of the synthesized solutions 
change across each iteration of Algo- 
rithm 1. Figure3 shows the percent- 
age of weight of solutions synthesized 
at each iteration with respect to the 
weight of the initial SyGuS solu- 
tion. The result shows that we can 
improve the solutions of CVC4 by 15- 
2596 in one iteration, and the solu- 
tions of ESolver by 20-5096 when tak- 
ing one iteration and 50-6096 when 
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Fig. 3. Solution weight across iterations. 


taking two. The Prob benchmarks, which require two iterations, can be improved 
more when using ESolver because ESolver tends to synthesize small terms whose 
probability may also be small. To answer Q3, QUASI can improve the weights 


of SyGuS solutions by 20-60%. 
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5.4 Multi-objective Optimization 


In this section, we evaluate the effectiveness of QUASI on the 8 benchmarks 
involving two minimization objectives. The benchmarks consists of two families, 
4 for sorted optimization and 4 for Pareto optimization. The sorted optimization 
benchmarks ask to minimize first the number of occurrences of specified operator 
(bvand in hacks and ite in array search) and then the size of the solution. The 
Pareto optimization benchmarks have the same objectives as sorted optimization 
but here we are synthesizing a Pareto optimal solution instead of sorted optimal 
one. The results are shown in Table 2. We do not present the results using CVCA 
because it cannot solve any of the benchmarks. 

The array_search times out since it is already hard on a single objective. 
For the hackers_5 benchmarks, the initial solution is already optimized for the 
first objective, so the problem degenerates to the single-objective optimization 
problem. For the hackers_7 and hackers_17, we present the weights of the 
intermediate solutions we can see that Pareto and Sorted optimizations yield 
different solutions. To answer Q4, QUASI can solve problems with multiple 
objectives when the same problems are feasible with a single objective. 


Table 2. Performance of QUASI on multi-objective benchmarks. Weight denotes the 
sequence of weights explored during minimization. 


Problem Time [sec] Weight Largest | Size 
Trop x Trop | array_search_sorted | TO - - 15 
hackers_5_sorted 0.1+0.1+01 (0,3) — (0,2 31 13 
hackers_7_sorted 0.14-0.34-0.1 (1,4) — (0,5) — (0,3) | 72 13 
hackers 17 sorted | 0.1+156.1+TO | (2,5) — (1,4) — (0,6) | 97 13 
array search pareto | TO - - 15 
hackers_5_pareto 0.1+0.1+01 (0,3) — (0,2 31 13 
hackers_7_pareto 0.14-0.34-0.1 (1,4) —^ (1,3) — (0,3) | 74 13 
hackers 17 pareto | 0.1+9.1+0.1 (2,5) — (2,4) — (1,4) | 54 13 


6 Related Work 


Qualitative Synthesis. Existing program synthesizers fall in three categories: (7) 
enumeration solvers, which typically output the smallest program [1], (ii) sym- 
bolic solvers, which reduce the synthesis problem to a constraint solving problem 
and output whatever program is produced by the constraint solver [21], (iii) 
probabilistic synthesizers, which randomly search the space for a solution and 
are typically unpredictable [18]. Since the introduction of the SvGuS format [2], 
these techniques have been used to build several SYGUS solvers that have com- 
peted in SYGUS competitions [4]. The most effective ones, which are used in this 
paper are ESolver a2nd EUSolver [1] (enumeration), and CVC4 [6] (symbolic). 
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Quantitative synthesis. Domain-specific synthesizers typically employ hard- 
coded ranking functions that guide the search towards a “preferable” program 
[17], but these functions are typically hard to write and are decoupled from 
the functional specification. Unlike QSYGUS, these synthesizers allow arbitrary 
ranking functions to be expressed in general purpose languages, but typically 
only support limited grammars for synthesis. Moreover, in many practical appli- 
cations the ranking functions are very simple. For example, the popular spread- 
sheet formula synthesizer FlashFill [12] uses a ranking function to prefer small 
programs with few constants. This type of objective is expressible in our frame- 
work. 

'The Sketch synthesizer supports optimization objectives over variables in 
sketched programs [20]. This work differs from ours in that sketches are a differ- 
ent specification mechanism from SYGUS. In Sketch the search space is encoded 
as a program with holes to facilitate synthesis by constraint solving. Translating 
SyGusS problems into sketches is non-trivial and results in poor performance. 

The work closest to ours is Synapse [7], which combines sketching with an 
approach similar to ours. For the same reasons as for Sketch, Synapse differs 
from our work because it proposes a different search space mechanisms. How- 
ever, there are a few analogies between our work and Synapse that are worth 
explaining in detail. Synapse supports syntactic cost functions that are defined 
using a decidable theory, and separately from the sketch search space. Synthesis 
is done using an iterative search where sketches—i.e., set of partial programs 
with holes—of increasing sizes are given to the synthesizer. At the high level, 
the intermediate sketches are related to our notion of reduced grammars—i.e., 
they accept solution of weight less than a given constant. However, while our 
algorithm generates reduced grammars automatically for a well-defined family 
of semirings, Synapse requires the user to provide a function for generating the 
intermediate sketches. Moreover, since Synapse requires cost functions that are 
defined using a decidable theory, it would not support certain families of costs 
QSyYGuS supports—e.g., the probabilistic semiring. 

Koukoutos et al. [15] have proposed the use of probabilistic tree grammars to 
guide the search of enumerative synthesizers on applications outside of SYGUS. 
Their algorithm enumerates all terms accepted by the grammar in decreasing 
probability using a variant of the search algorithm A* and requires the grammar 
to not contain transitions of weight 1 to avoid getting stuck. Probabilistic tree 
grammars are a special case of QSYGUS and our algorithm does not impose 
limitations of what weights can appear in the grammar. Moreover, our algorithm 
does not require implementing a new solver when changing the cost semiring. 


7 Conclusion 


We presented QSYGUS, a general framework for defining and solving SYGUS 
problems in the presence of quantitative objectives over the syntax of the pro- 
grams. QSYGUS is (i) natural: requires minimal modification to the SvGuS 
format, (ii) general: it supports complex but practical types of weights, (iii) 
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formal: it is grounded in the theory of weighted tree grammars, (iv) effective: 
our tool QUASI can solve quantitative variations of existing SYGUS benchmarks 
with little overhead. In the future, we plan to extend our framework to handle 
probabilistic objectives and quantitative objectives over the semantics of the 
program—e.g., synthesize programs that satisfy most of the specification. 


Acknowledgements. The authors were supported by National Science Foundation 
Grants CCF-1637516, CCF-1704117 and a Google Research Award. 


References 


1. ESolver. https://github.com/abhishekudupa/sygus-comp14 

2. Alur, R., Bodik, R., Juniwal, G., Martin, M.M., Raghothaman, M., Seshia, S.A., 
Singh, R., Solar-Lezama, A., Torlak, E., Udupa, A.: Syntax-guided synthesis. In: 
2013 Formal Methods in Computer-Aided Design (FMCAD), pp. 1-8. IEEE (2013) 

3. Alur, R., Fisman, D., Singh, R., Solar-Lezama, A.: Results and analysis of SyGuS- 
comp 2015. arXiv preprint arXiv:1602.01170 (2016) 

4. Alur, R., Fisman, D., Singh, R., Solar-Lezama, A.: Sygus-comp 2016: results and 
analysis. arXiv preprint arXiv:1611.07627 (2016) 

5. Alur, R., Radhakrishna, A., Udupa, A.: Scaling enumerative program synthesis via 
divide and conquer. In: Legay, A., Margaria, T. (eds.) TACAS 2017. LNCS, vol. 
10205, pp. 319-336. Springer, Heidelberg (2017). https: //doi.org/10.1007/978-3- 
662-54577-5_18 

6. Barrett, C., et al.: CVC4. In: Gopalakrishnan, G., Qadeer, S. (eds.) CAV 2011. 
LNCS, vol. 6806, pp. 171-177. Springer, Heidelberg (2011). https://doi.org/10. 
1007/978-3-642-22110-1 14 

7. Bornholt, J., Torlak, E., Grossman, D., Ceze, L.: Optimizing synthesis with metas- 
ketches. In: Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium 
on Principles of Programming Languages, POPL 2016, pp. 775-788. ACM, New 
York (2016) 

8. Caulfield, B., Rabe, M.N., Seshia, S.A., Tripakis, S.: What's decidable about 
syntax-guided synthesis? CoRR abs/1510.08393 (2015) 

9. Comon, H., Dauchet, M., Gilleron, R., Lóding, C., Jacquemard, F., Lugiez, D., 
Tison, S., Tommasi, M.: Tree automata techniques and applications (2007). http:// 
www.grappa.univ-lille3.fr/tata. Accessed 12 Oct 2007 

10. Droste, M., Kuich, W., Vogler, H.: Handbook of Weighted Automata, 1st edn. 
Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-01492-5 

11. Droste, M., Vogler, H.: Weighted tree automata and weighted logics. Theor. Com- 
put. Sci. 366(3), 228-247 (2006). Automata and Formal Languages 

12. Gulwani, S.: Automating string processing in spreadsheets using input-output 
examples. In: Proceedings of the 38th ACM SIGPLAN-SIGACT Symposium on 
Principles of Programming Languages, POPL 2011, 26-28 January 2011, Austin, 
TX, USA, pp. 317-330 (2011) 

13. Gulwani, S.: Programming by examples: applications, algorithms, and ambiguity 
resolution. In: Olivetti, N., Tiwari, A. (eds.) IJCAR 2016. LNCS (LNAI), vol. 9706, 
pp. 9-14. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-40229-1.2 

14. Hu, Q., D'Antoni, L.: Automatic program inversion using symbolic transducers. In: 
Proceedings of the 38th ACM SIGPLAN Conference on Programming Language 
Design and Implementation, PLDI 2017, 18-23 June 2017, Barcelona, Spain, pp. 
376—389 (2017) 


15. 


16. 


17. 


18. 


19. 


20. 


21. 


Syntax-Guided Synthesis with Quantitative Syntactic Objectives 403 


Koukoutos, M., Raghothaman, M., Kneuss, E., Kuncak, V.: On repair with prob- 
abilistic attribute grammars. CoRR abs/1707.04148 (2017) 

Ngo, V.C., Dehesa-Azuara, M., Fredrikson, M., Hoffmann, J.: Verifying and syn- 
thesizing constant-resource implementations with types. In: 2017 IEEE Symposium 
on Security and Privacy (SP), pp. 710-728, May 2017 

Polozov, O., Gulwani, S.: Flashmeta: a framework for inductive program synthesis. 
In: Proceedings of the 2015 ACM SIGPLAN International Conference on Object- 
Oriented Programming, Systems, Languages, and Applications, OOPSLA 2015, 
part of SPLASH 2015, 25-30 October 2015, Pittsburgh, PA, USA, pp. 107-126 
(2015) 

Schkufza, E., Sharma, R., Aiken, A.: Stochastic program optimization. Commun. 
ACM 59(2), 114-122 (2016) 

Singh, R., Gulwani, S.: Predicting a correct program in programming by example. 
In: Kroening, D., Pásáreanu, C.S. (eds.) CAV 2015. LNCS, vol. 9206, pp. 398-414. 
Springer, Cham (2015). https://doi.org/10.1007/978-3-319-21690-4_23 

Singh, R., Gulwani, S., Solar-Lezama, A.: Automated feedback generation for intro- 
ductory programming assignments. In: Proceedings of PLDI 2013, pp. 15-26. ACM, 
New York (2013) 

Solar-Lezama, A.: Program sketching. Int. J. Softw. Tools Technol. Transf. 15(5), 
475-495 (2013) 


Open Access This chapter is licensed under the terms of the Creative Commons 
Attribution 4.0 International License (http://creativecommons.org/licenses/by /4.0/), 
which permits use, sharing, adaptation, distribution and reproduction in any medium 
or format, as long as you give appropriate credit to the original author(s) and the 
source, provide a link to the Creative Commons license and indicate if changes were 
made. 


The images or other third party material in this chapter are included in the chapter’s 


Creative Commons license, unless indicated otherwise in a credit line to the material. If 
material is not included in the chapter's Creative Commons license and your intended 
use is not permitted by statutory regulation or exceeds the permitted use, you will 
need to obtain permission directly from the copyright holder. 


Learning 


® 


Check for 
updates 


Learning Abstractions for Program 
Synthesis 


Xinyu Wang!) , Greg Anderson! 9), Isil Dillig'), and K. L. McMillan?(*9 


University of Texas, Austin, USA 
{xwang, ganderso,isil}@cs.utexas.edu 
? Microsoft Research, Redmond, USA 


kenmcmil@microsoft.com 


Abstract. Many example-guided program synthesis techniques use 
abstractions to prune the search space. While abstraction-based syn- 
thesis has proven to be very powerful, a domain expert needs to provide 
a suitable abstract domain, together with the abstract transformers of 
each DSL construct. However, coming up with useful abstractions can 
be non-trivial, as it requires both domain expertise and knowledge about 
the synthesizer. In this paper, we propose a new technique for learning 
abstractions that are useful for instantiating a general synthesis frame- 
work in a new domain. Given a DSL and a small set of training problems, 
our method uses tree interpolation to infer reusable predicate templates 
that speed up synthesis in a given domain. Our method also learns suit- 
able abstract transformers by solving a certain kind of second-order con- 
straint solving problem in a data-driven way. We have implemented the 
proposed method in a tool called ATLAS and evaluate it in the context of 
the BLAZE meta-synthesizer. Our evaluation shows that (a) ATLAS can 
learn useful abstract domains and transformers from few training prob- 
lems, and (b) the abstractions learned by ATLAS allow BLAZE to achieve 
significantly better results compared to manually-crafted abstractions. 


1 Introduction 


Program synthesis is a powerful technique for automatically generating pro- 
grams from high-level specifications, such as input-output examples. Due to its 
myriad use cases across a wide range of application domains (e.g., spreadsheet 
automation [1-3], data science [4-6], cryptography [7,8], improving program- 
ming productivity [9-11]), program synthesis has received widespread attention 
from the research community in recent years. 

Because program synthesis is, in essence, a very difficult search problem, 
many recent solutions prune the search space by utilizing program abstrac- 
tions [4, 12-16]. For example, state-of-the-art synthesis tools, such as BLAZE [14], 
MORPHEUS [4] and Scythe [16], symbolically execute (partial) programs over 
some abstract domain and reject those programs whose abstract behavior is 
inconsistent with the given specification. Because many programs share the same 
behavior in terms of their abstract semantics, the use of abstractions allows these 
synthesis tools to significantly reduce the search space. 


© The Author(s) 2018 
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ATLAS approach 
Current abstraction 
pst + > Abstraction P ; Abstraction 
training learner CNN Synthesizer ( predicates + 
problems transformers ) 
Synthesized programs 


Fig. 1. Schematic overview of our approach. 


While the abstraction-guided synthesis paradigm has proven to be quite pow- 
erful, a down-side of such techniques is that they require a domain expert to 
manually come up with a suitable abstract domain and write abstract transform- 
ers for each DSL construct. For instance, the BLAZE synthesis framework [14] 
expects a domain expert to manually specify a universe of predicate templates, 
together with sound abstract transformers for every DSL construct. Unfortu- 
nately, this process is not only time-consuming but also requires significant 
insight about the application domain as well as the internal workings of the 
synthesizer. 

In this paper, we propose a novel technique for automatically learning 
domain-specific abstractions that are useful for instantiating an example-guided 
synthesis framework in a new domain. Given a DSL and a training set of synthe- 
sis problems (i.e., input-output examples), our method learns a useful abstract 
domain in the form of predicate templates and infers sound abstract transform- 
ers for each DSL construct. In addition to eliminating the significant manual 
effort required from a domain expert, the abstractions learned by our method 
often outperform manually-crafted ones in terms of their benefit to synthesizer 
performance. 

The workflow of our approach, henceforth called ATLAS!, is shown schemat- 
ically in Fig.1. Since ATLAS is meant to be used as an off-line training step 
for a general-purpose programming-by-example (PBE) system, it takes as input 
a DSL as well as a set of synthesis problems € that can be used for train- 
ing purposes. Given these inputs, our method enters a refinement loop where 
an Abstraction Learner component discovers a sequence of increasingly pre- 
cise abstract domains A,,--,A,, and their corresponding abstract transformers 
Ti, =, Ta, in order to help the Abstraction- Guided Synthesizer (AGS) solve all 
training problems. While the AGS can reject many incorrect solutions using an 
abstract domain .A;, it might still return some incorrect solutions due to the 
insufficiency of A;. Thus, whenever the AGS returns an incorrect solution to 
any training problem, the Abstraction Learner discovers a more precise abstract 
domain and automatically synthesizes the corresponding abstract transformers. 
Upon termination of the algorithm, the final abstract domain An and trans- 
formers 7, are sufficient for the AGS to correctly solve all training problems. 
Furthermore, because our method learns general abstractions in the form of 


1 ATLAS stands for AuTomated Learning of AbStractions. 
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predicate templates, the learned abstractions are expected to be useful for solv- 
ing many other synthesis problems beyond those in the training set. 

From a technical perspective, the Abstraction Learner uses two key ideas, 
namely tree interpolation and data-driven constraint solving, for learning useful 
abstract domains and transformers respectively. Specifically, given an incorrect 
program 7 that cannot be refuted by the AGS using the current abstract domain 
Aj, the Abstraction Learner generates a tree interpolant 7; that serves as a proof 
of Ps incorrectness and constructs a new abstract domain A;,, by extracting 
templates from the predicates used in Z;. The Abstraction Learner also synthe- 
sizes the corresponding abstract transformers for .A;,4 by setting up a second- 
order constraint solving problem where the goal is to find the unknown relation- 
ship between symbolic constants used in the predicate templates. Our method 
solves this problem in a data-driven way by sampling input-output examples 
for DSL operators and ultimately reduces the transformer learning problem to 
solving a system of linear equations. 

We have implemented these ideas in a tool called ATLAS and evaluate it in the 
context of the BLAZE program synthesis framework [14]. Our evaluation shows 
that the proposed technique eliminates the manual effort involved in design- 
ing useful abstractions. More surprisingly, our evaluation also shows that the 
abstractions generated by ATLAS outperform manually-crafted ones in terms of 
the performance of the BLAZE synthesizer in two different application domains. 

To summarize, this paper makes the following key contributions: 


— We describe a method for learning abstractions (domains/transformers) that 
are useful for instantiating program synthesis frameworks in new domains. 

— We show how tree interpolation can be used for learning abstract domains 
(i.e., predicate templates) from a few training problems. 

— We describe a method for automatically synthesizing transformers for a given 
abstract domain under certain assumptions. Our method is guaranteed to find 
the unique best transformer if one exists. 

— We implement our method in a tool called ATLAS and experimentally evaluate 
it in the context of the BLAZE synthesis framework. Our results demonstrate 
that the abstractions discovered by ATLAS outperform manually-written ones 
used for evaluating BLAZE in two application domains. 


2 Illustrative Example 


Suppose that we wish to use the BLAZE meta-synthesizer to automate the class 
of string transformations considered by FlashFill [1] and BlinkFill [17]. In the 
original version of the BLAZE framework [14], a domain expert needs to come 
up with a universe of suitable predicate templates as well as abstract transform- 
ers for each DSL construct. We will now illustrate how ATLAS automates this 
process, given a suitable DSL and its semantics (e.g., the one used in [17]). 

In order to use ATLAS, one needs to provide a set of synthesis problems £ (i.e., 
input-output examples) that will be used in the training process. Specifically, let 
us consider the three synthesis problems given below: 
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E2 : 4 “510.220.5586” ++ “510-220-5586” i, 
E: “\ Company \Code\ index.html" — “\Company\Code\”, 
“\ Company \Docs\Spec\specs. html” +> “\Company\Docs\Spec\” 


£i: | “CAV” +> “CAV2018”, “SAS” ++ “SAS2018”, “FSE”  "FSE2018" }, 


In order to construct the abstract domain A and transformers 7, ATLAS 
starts with the trivial abstract domain Ag = {T} and transformers 7o, defined 
as [F(T,--, T)]*# = T for each DSL construct F. Using this abstraction, ATLAS 
invokes BLAZE to find a program Po that satisfies specification €, under the 
current abstraction (Ao, 70). However, since the program Po returned by BLAZE 
is incorrect with respect to the concrete semantics, ATLAS tries to find a more 
precise abstraction that allows BLAZE to succeed. 

Towards this goal, ATLAS enters a refinement loop that culminates in the 
discovery of the abstract domain A; = {T,len([a]) = c, len(a) Z c}, where a 
denotes a variable and c is an integer constant. In other words, A; tracks equality 
and inequality constraints on the length of strings. After learning these predicate 
templates, ATLAS also synthesizes the corresponding abstract transformers 7}. 
In particular, for each DSL construct, ATLAS learns one abstract transformer 
for each combination of predicate templates used in .A,. For instance, for the 
Concat operator which returns the concatenation y of two strings 11,22, ATLAS 
synthesizes the following abstract transformers, where * denotes any predicate: 


[Concat(T, x^ = T 
[Concat(x, Ty}? = T 


T= [Concat (len(z1) # c1, len(x2) # c2) = T 
[Concat (len(a1) = c, len(x2) = c2)] = (len(y) = cı + c2) 
[Concat (len(x1) = c1, len(x2) Æ c3] ES (len(y) zZcc c2) 
[Concat (len(a1) z c,len(z2) = c3] = (len(y) zZc-c c2) 


Since the AGS can successfully solve E, using (A1, 71), ATLAS now moves on to 
the next training problem. 

For synthesis problem £5, the current abstraction (A1, 71) is not sufficient 
for BLAZE to discover the correct program. After processing £5, ATLAS refines 
the abstract domain to the following set of predicate templates: 


A» = { T, len([a]) = c, len([a]) + c, charAt([a],i) = c, charAt(a] i) Z c }. 


Observe that ATLAS has discovered two additional predicate templates that 
track positions of characters in the string. ATLAS also learns the correspond- 
ing abstract transformers 72 for Ag. 

Moving on to the final training problem E3, BLAZE can already successfully 
solve it using (A2, 72); thus, ATLAS terminates with this abstraction. 


3 Overall Abstraction Learning Algorithm 


Our top-level algorithm for learning abstractions, called LEARNABSTRACTIONS, 
is shown in Fig.2. The algorithm takes two inputs, namely a domain-specific 
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1: procedure LEARNABSTRACTIONS(£, £) 
input: Domain-specific language £ and a set of training problems €. 
output: Abstract domain A and transformers T. 
Ae{T}; > Initialization. 
T — { [F(T, =, D]! = T | F € Constructs(£) }; 
for i — 1,--,|E| do 
while true do » Refinement loop. 
P + Synthesize(L, £i, A, T); > Invoke AGS. 
if P = null then break; 
if IsCorrect(P, €;) then break; 
A « AU LEARNABSTRACTDOMAIN(P, £i); 
T < LEARNTRANSFORMERS(£Z, A); 


11: return (A, T); 


Fig. 2. Overall learning algorithm. Constructs gives the DSL constructs in £. 


language £ (both syntax and semantics) as well as a set of training problems £, 
where each problem is specified as a set of input-output examples £;. The output 
of our algorithm is a pair (A, T), where A is an abstract domain represented by 
a set of predicate templates and 7 is the corresponding abstract transformers. 

At a high-level, the LEARNABSTRACTIONS procedure starts with the most 
imprecise abstraction (just consisting of T) and incrementally improves the pre- 
cision of the abstract domain A whenever the AGS fails to synthesize the correct 
program using A. Specifically, the outer loop (lines 4-10) considers each training 
instance £; and performs a fixed-point computation (lines 5-10) that terminates 
when the current abstract domain A is good enough to solve problem €;. Thus, 
upon termination, the learned abstract domain A is sufficiently precise for the 
AGS to solve all training problems €. 

Specifically, in order to find an abstraction that is sufficient for solving €;, our 
algorithm invokes the AGS with the current abstract domain A and correspond- 
ing transformers 7 (line 6). We assume that Synthesize returns a program P that 
is consistent with £; under abstraction (A, 7). That is, symbolically executing 
P (according to 7) on inputs £/" yields abstract values y that are consistent 
with the outputs £?"* (i.e., Vj. EZ” € 7(y;)). However, while P is guaranteed to 
be consistent with E; under the abstract semantics, it may not satisfy €; under 
the concrete semantics. We refer to such a program 7 as spurious. 

Thus, whenever the call to IsCorrect fails at line 8, we invoke the LEARNAB- 
STRACTDOMAIN procedure (line 9) to learn additional predicate templates that 
are later added to .A. Since the refinement of .A necessitates the synthesis of new 
transformers, we then call LEARNTRANSFORMERS (line 10) to learn a new 7. 
The new abstraction is guaranteed to rule out the spurious program P as long 
as there is a unique best transformer of each DSL construct for domain A. 
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4 Learning Abstract Domain Using Tree Interpolation 


In this section, we present the LEARNABSTRACTDOMAIN procedure: Given a 
spurious program P and a synthesis problem € that P does not solve, our goal 
is to find new predicate templates A’ to add to the abstract domain A such that 
the Abstraction-Guided Synthesizer no longer returns 7? as a valid solution to the 
synthesis problem £. Our key insight is that we can mine for such useful predicate 
templates by constructing a tree interpolation problem. In what follows, we first 
review tree interpolants (based on [18]) and then explain how we use this concept 
to find useful predicate templates. 


Definition 1 (Tree interpolation problem). A tree interpolation problem 
T = (Vr, P, L) is a directed labeled tree, where V is a finite set of nodes, r € V 
is the root, P : (V\{r}) — V is a function that maps children nodes to their 
parents, and L : V +> F is a labeling function that maps nodes to formulas from 
a set F of first-order formulas such that A,cy, L(v) is unsatisfiable. 


In other words, a tree interpolation problem is defined by a tree T' where each 
node is labeled with a formula and the conjunction of these formulas is unsat- 
isfiable. In what follows, we write Desc(v) to denote the set of all descendants 
of node v, including v itself, and we write NonDesc(v) to denote all nodes other 
than those in Desc(v) (i.e., VV Desc(v)). Also, given a set of nodes V’, we write 
L(V") to denote the set of all formulas labeling nodes in V". 

Given a tree interpolation problem T', a tree interpolant T is an annotation 
from every node in V to a formula such that the label of the root node is false 
and the label of an internal node v is entailed by the conjunction of annotations 
of its children nodes. More formally, a tree interpolant is defined as follows: 


Definition 2 (Tree interpolant). Given a tree interpolation problem T = 
(V, r, P, L), a tree interpolant for T is a function T : V > F that satisfies the 
following conditions: 

1. T(r) = false; 

2. For each v € V: (( Apis L(ci)) ^ L(v)) => I(v); 

3. For each v € V: Vars(Z(v)) € Vars(L(Desc(v))) N Vars(L(NonDesc(v))). 


Intuitively, the first condition ensures 
that Z establishes the unsatisfiability rm (7) "— 
of formulas in T', and the second con- m | 


dition states that Z is a valid annota- len(v1) = len(va) + len(vs) 
s Š o y ^ VOxi«len(v) : vi[i] = vo[i] 
tion. As standard in Craig interpola- len(v1) #7 ^ V len(v2) € j < len(v2) + len(va) : 


tion [19,20], the third condition stip- vily] = enia 
ulates a “shared vocabulary” condi- 
tion by ensuring that the annotation 
at each node v refers to the common 
variables between the descendants and 


non-descendants of v. 


v3 = “18” 


len(v3) = 3 len(v3) — 2 


Fig. 3. A tree interpolation problem and a 
tree interpolant (underlined). 
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1: procedure LEARNABSTRACTDOMAIN(P, €) 


input: Program P that does not solve problem € (set of examples). 
output: Set of predicate templates A’. 
A’ — () 
for each (ein, Cou) € € do 
if [Plein Z eou then 
T «— CONSTRUCTTREE(P, eis, €ou:); 
T + FindTreeltp(T); 
for each v € Nodes(T)\{r} do 
A’ — A' U (MakeSymbolic(Z(v)) }; 


return A’; 


Fig. 4. Algorithm for learning abstract domain using tree interpolation. 


Example 1. Consider the tree interpolation problem T = (V,r, P, D) in Fig. 3, 
where L(v) is shown to the right of each node v. A tree interpolant Z for this 
problem maps each node to the corresponding underlined formula. For instance, 
we have Z(v,) = (len(v,) 4 7). It is easy to confirm that Z is a valid interpolant 
according to Definition 2. 


To see how tree interpolation is useful for learning predicates, suppose that 
the spurious program P is represented as an abstract syntax tree (AST), where 
each non-leaf node is labeled with the axiomatic semantics of the corresponding 
DSL construct. Now, since P does not satisfy the given input-output example 
(€in, €out), we are able to use this information to construct a labeled tree where 
the conjunction of labels is unsatisfiable. Our key idea is to mine useful predicate 
templates from the formulas used in the resulting tree interpolant. 

With this intuition in mind, let us consider the LEARNABSTRACTDOMAIN 
procedure shown in Fig.4: The algorithm uses a procedure called CONSTRUCT- 
TREE to generate a tree interpolation problem T for each input-output example 
(ein, Cou)” that program P does not satisfy (line 5). Specifically, letting M denote 
the AST representation of P, we construct T = (V,r, P, L) as follows: 


— V consists of all AST nodes in // as well as a “dummy” node d. 

— The root r of T is the dummy node d. 

— P is a function that maps children AST nodes to their parents and maps the 
root AST node to the dummy node d. 

— L maps each node v € V to a formula as follows: 


v' = eut v is the dummy root node with child v’. 
V = Cin v is a leaf representing program input ein. 
L(v) = U—c v is a leaf representing constant c. 


$r[v'/z,v/y] v represents DSL operator F with axiomatic semantics 
ór(z,y) and v' represents children of v. 


? Without loss of generality, we assume that programs take a single input x, as we can 
always represent multiple inputs as a list. 
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Essentially, the CONSTRUCTTREE procedure labels any leaf node represent- 
ing the program input with the input example ein and the root node with the 
output example e,,,,. All other internal nodes are labeled with the axiomatic 
semantics of the corresponding DSL operator (modulo renaming).? Observe that 
the formula /\,,-y L(v) is guaranteed to be unsatisfiable since P does not satisfy 
the I/O example (ein, Cour); thus, we can obtain a tree interpolant for T. 


Example 2. Consider program P : Concat(z, “18”) which concatenates constant 
string “18” to input x. Figure3 shows the result of invoking CONSTRUCTTREE 
for P and input-output example (“CAV”, ^CAV2018"). As mentioned in Exam- 
ple 1, the tree interpolant Z for this problem is indicated with the underlined 
formulas. 


Since the tree interpolant Z effectively establishes the incorrectness of pro- 
gram 7, the predicates used in Z serve as useful abstract values that the syn- 
thesizer (AGS) should consider during the synthesis task. Towards this goal, 
the LEARNABSTRACTDOMAIN algorithm iterates over each predicate used in 7 
(lines 7-8 in Fig. 4) and converts it to a suitable template by replacing the con- 
stants and variables used in Z(v) with symbolic names (or “holes” ). Because the 
original predicates used in Z may be too specific for the current input-output 
example, extracting templates from the interpolant allows our method to learn 
reusable abstract domains. 


Example 3. Given the tree interpolant 7 from Example 1, LEARNABSTRACTDO- 
MAIN extracts two predicate templates, namely, len(| al) = c and len(|a ]) Z c. 


5 Synthesis of Abstract Transformers 


In this section, we turn our attention to the LEARNTRANSFORMERS procedure 
for synthesizing abstract transformers 7 for a given abstract domain .A. Follow- 
ing presentation in prior work [14], we consider abstract transformers that are 
described using equations of the following form: 


[F 6a Gai; c1), =, Xn(Zn, cn))]' S A Xj (y. f ;(c)) (1) 


1<j<m 


Here, F is a DSL construct, xi, X; are predicate templates*, x; is the i'th input of 
F, yis F’s output, c1, ^, Cn are vectors of symbolic constants, and f ; denotes a 
vector of affine functions over c = c3,-:, Cn. Intuitively, given concrete predicates 
describing the inputs to F’, the transformer returns concrete predicates describing 
the output. Given such a transformer T, let Outputs(7) be the set of pairs (xj, f ;) 
in Eq. 1. 


3 Here, we assume access to the DSL’s axiomatic semantics. If this is not the case (i.e., 
we are only given the DSL’s operational semantics), we can still annotate each node 
as v — c where c denotes the output of the partial program rooted at node v when 
executed on ein. However, this may affect the quality of the resulting interpolant. 

^ We assume that x1, :, X^, are distinct. 
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1: procedure LEARNTRANSFORMERS(L, A) 

input: DSL £ and abstract domain A. 

output: A set of transformers 7 for constructs in £ and abstract domain A. 
for each F € Constructs(£) do 


for (x1, ^, Xa) € A” do 
Qe «T; > q is output of transformer. 


for x; € Ado 


E «— GENERATEEXAMPLES($F, X5, X1, *, Xn); 
f;  Solve( E); 


if f; # null ^ Valid A[f;]) then p & (9 A x}(y, Filer 6) 


9: T — TU (IF6Ga (zi c1), s Xn (tn; 6)! = p}; 
10: return 7; 


Fig. 5. Algorithm for synthesizing abstract transformers. or at line 6 denotes the 
axiomatic semantics of DSL construct F. Formula A at line 8 refers to Eq. 5. 


We define the soundness of a transformer 7 for DSL operator F with respect 
to F’s axiomatic semantics p. In particular, we say that the abstract trans- 
former from Eq. 1 is sound if the following implication is valid: 


(or(a,) ^ A xi(ai, e) > A xj(uF;(e)) (2) 


1<i<n 1<j<m 


That is, the transformer for F is sound if the (symbolic) output predicate is 
indeed implied by the (symbolic) input predicates according to F’s semantics. 

Our key observation is that the problem of learning sound transformers can 
be reduced to solving the following second-order constraint solving problem: 


f. v V. ((ór(m. i) ^ A xi(%i, ci) > \ xi (uv, F;(€))) (3) 


l<i<n 1<j<m 


where f = f1,°,f,, and V includes all variables and functions from Eq. 2 
other than f. In other words, the goal of this constraint solving problem is to 
find interpretations of the unknown functions f that make Eq. 2 valid. Our key 
insight is to solve this problem in a data-driven way by exploiting the fact that 
each unknown function f; is affine. 

Towards this goal, we first express each affine function f; (c) as follows: 


File) = Pj,k,1 © C1 +++ FPjkJe| * Ce] T Pj,k,lel+1 


where each p;.;,; corresponds to an unknown integer constant that we would like 
to learn. Now, arranging the coefficients of functions f;ji,-, Ff, in f; into a 
[f ;| x (lc| + 1) matrix Pj, we can represent f';(c) in the following way: 
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fiale) Djii " — DPjil|c|i * 
ON = " = s " (4) 


c 
firj Ce) Pitan Phe] | T 
Ea MLB —— 
cT P; bud 


ct 


where c! is cT appended with the constant 1. 

Given this representation, it is easy to see that the problem of synthesizing 
the unknown functions f,,--,f,,, from Eq. 2 boils down to finding the unknown 
matrices P,,--, Pm such that each Pj makes the following implication valid: 


A= (((e)" = Piet) A br(w,y) A A i(ai, €2)) => xi (y, cj) (5) 


l<i<n 


Our key idea is to infer these unknown matrices P,--,P,, in a data-driven 
way by generating input-output examples of the form [i1, +, ije] + [o1, olf |] 
for each f ;. In other words, ? and o correspond to instantiations of c and f ;(c) 
respectively. Given sufficiently many such examples for every f;, we can then 
reduce the problem of learning each unknown matrix P; to the problem of solving 
a system of linear equations. 

Based on this intuition, the LEARNTRANSFORMERS procedure from Fig. 5 
describes our algorithm for learning abstract transformers 7 for a given abstract 
domain .A. At a high-level, our algorithm synthesizes one abstract transformer 
for each DSL construct F and n argument predicate templates xi,-,x;4. In 
particular, given F and x1, :, Xn, the algorithm constructs the “return value" of 


the transformer as: 
p= A xu, fle) 
l<j<m 


where f ; is the inferred affine function for each predicate template X5- 

The key part of our LEARNTRANSFORMERS procedure is the inner loop (lines 
5-8) for inferring each of these f j S- Specifically, given an output predicate tem- 
plate X» our algorithm first generates a set of input-output examples E of the 
form [pi, -, pa] — po such that [F(pi, =, pa)’ = po is a sound (albeit overly spe- 
cific) transformer. Essentially, each p; is a concrete instantiation of a predicate 
template, so the examples E generated at line 6 of the algorithm can be viewed 
as sound input-output examples for the general symbolic transformer given in 
Eq. 1. (We will describe the GENERATEEXAMPLES procedure in Sect. 5.1). 

Once we generate these examples E, the next step of the algorithm is to 
learn the unknown coefficients of matrix Pj from Eq. 5 by solving a system of 
linear equations (line 7). Specifically, observe that we can use each input-output 
example [pi, =, pa] — po in E to construct one row of Eq. 4. In particular, we 
can directly extract c = c4, Cn from pi, “, p, and the corresponding value of 
f ;(c) from po. Since we have one instantiation of Eq. 4 for each of the input- 
output examples in E, the problem of inferring matrix Pj now reduces to solving 
a system of linear equations of the form APF = B where A is a |E| x (|e| + 1) 
(input) matrix and B is a |E| x |f;| (output) matrix. Thus, a solution to the 
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1: procedure GENERATEEXAMPLES(@F, X0, ©, Xn) 


input: Semantics @r of operator F and templates xo, *, Xn for output and inputs. 
output: A set of valid input-output examples E for DSL construct F. 
E + ( 
while —FullRank(E) do 
Draw (51,*:, Sn) randomly from distribution Rp over Domain( F); 
so €— [F (si, <, 85)]; 
(Ao, ++, An) «— Abstract(so, Xo, ©, $n, Xn); 
for each (po, ©, pa) € Ao X :: XA, do 
if Valid (A; <i<n pi \ or > po) then E+ EU tpi. <, Pn] >> po}; 


return E; 


Fig. 6. Example generation for learning abstract transformers. 


equation APF = B generated from E corresponds to a candidate solution for 
matrix Pj, which in turn uniquely defines f ;. 

Observe that the call to Solve at line 7 may return null if no affine function 
exists. Furthermore, any non-null f ; returned by Solve is just a candidate solu- 
tion and may not satisfy Eq. 5. For example, this situation can arise if we do not 
have sufficiently many examples in E and end up discovering an affine function 
that is “over-fitted” to the examples. Thus, the validity check at line 8 of the 
algorithm ensures that the learned transformers are actually sound. 


5.1 Example Generation 


In our discussion so far, we assumed an oracle that is capable of generating valid 
input-output examples for a given transformer. We now explain our GENERA- 
TEEXAMPLES procedure from Fig. 6 that essentially implements this oracle. In a 
nutshell, the goal of GENERATEEXAMPLES is to synthesize input-output exam- 
ples of the form [p1, ©, Pn] — po such that [F(pi,-,pa]]* = po is sound where 
each p; is a concrete predicate (rather than symbolic). 

Going into more detail, GENERATEEXAMPLES takes as input the semantics 
opr of DSL construct F for which we want to learn a transformer for as well as 
the input predicate templates x1,-,x4 and output predicate template xo that 
are supposed to be used in the transformer. For any example [pi, =, Pn] — po 
synthesized by GENERATEEXAMPLES, each concrete predicate p; is an instanti- 
ation of the predicate template x; where the symbolic constants used in x; are 
substituted with concrete values. 

Conceptually, the GENERATEEXAMPLES algorithm proceeds as follows: First, 
it generates concrete input-output examples [s1, =, Sn] — so by evaluating F 
on randomly-generated inputs s1,*, Sn (lines 4-5). Now, for each concrete I/O 
example [s1,--, Sn] — so, we generate a set of abstract I/O examples of the form 
[(P1,°t; Pn] ^^ po (line 6). Specifically, we assume that the return value (Ao, ~, An) 
of Abstract at line 6 satisfies the following properties for every p; € Ai: 
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— piis an instantiation of template x;. 

— pi is a sound over-approximation of s; (i.e., s; € y(p;)). 

- For any other p; satisfying the above two conditions, p; is not logically 
stronger than pj. 


In other words, we assume that Abstract returns a set of “best” sound abstrac- 
tions of (so, =, Sn) under predicate templates (xo, -:, x4). 

Next, given abstractions (Ao, -:, An) for (so, ++, Sn), we consider each candidate 
abstract example of the form [pi, =, Pn] — po where p; € A;. Even though each 
pi is a sound abstraction of s;, the example [pi, =, pan] — po may not be valid 
according to the semantics of operator F. Thus, the validity check at line 8 
ensures that each example added to E is in fact valid. 


Example 4. Given abstract domain A = (len((a]) = c}, suppose we want to 
learn an abstract transformer 7 for the Concat operator of the following form: 


[Concat (len(z1) = c1, len(z2) = c2)]* = (len(y) = f([e1. c2])) 


We learn the affine function f used in the transformer by first generating 
a set E of I/O examples for f (line 6 in LEARNTRANSFORMERS). In particu- 
lar, GENERATEEXAMPLES generates concrete input values for Concat at random 
and obtains the corresponding output values by executing Concat on the input 
values. For instance, it may generate sı = “abc” and s2 = “de” as inputs, and 
obtain sg = “abcde” as output. Then, it abstracts these values under the given 
templates. In this case, we have an abstract example with pı = (len(x1) = 3), 
p2 = (len(z?) = 2) and po = (len(y) = 5). Since [pi,p2] — po is a valid 
example, it is added in E (line 8 in GENERATEEXAMPLES). At this point, E is 
not yet full rank, so the algorithm keeps generating more examples. Suppose it 
generates two more valid examples (len(ri) = 1, len(x2) = 4) + (len(y) = 5) 
and (len(x1) = 6, len(z2) = 4) + (len(y) = 10). Now E is full rank, so LEARN- 
‘TRANSFORMERS computes f by solving the following system of linear equations: 


321 5 
141| PT=|5 | 2M P=[110] 
641 10 
Here, P corresponds to the function f([ci,cao]) = cı + c2, and this func- 


tion defines the sound transformer: [Concat (len(x1) = c4, len(x2) = c3] = 
(len(y) = c1 + c2) which is added to 7 at line 9 in LEARNTRANSFORMERS. 


6 Soundness and Completeness 


In this section we present theorems stating some of the soundness, completeness, 
and termination guarantees of our approach. All proofs can be found in the 
extended version of this paper [21]. 


Theorem 1 (Soundness of LEARNTRANSFORMERS). Let T be the set of 
transformers returned by LEARNTRANSFORMERS. Then, every T € T is sound 
according to Eq. 2. 
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The remaining theorems are predicated on the assumptions that for each DSL 
construct F and input predicate templates x1, , Xn (i) there exists a unique best 
abstract transformer and (ii) the strongest transformer expressible in Eq. 2 is 
logically equivalent to the unique best transformer. Thus, before stating these 
theorems, we first state what we mean by a unique best abstract transformer. 


Definition 3 (Unique best function). Consider a family of transformers of 
the shape [F (x1(21, C1), ©, Xn (£n, cn) JJ’ = x'(y,*). We say that f is the unique 
best function for (F, X1, ”, Xn, X ) if (a) replacing x with f yields a sound trans- 
former, and (b) replacing x with any other f’ yields a transformer that is either 
unsound or strictly worse (i.e., X (y, f) => x (y, f) and x'(y, P) # x (y, P). 


We now define unique best transformer in terms of unique best function: 


Definition 4 (Unique best transformer). Let F be a DSL construct and 
let (x1,, Xn) € A” be the input templates for F. We say that the abstract 
transformer T is a unique best transformer for F, X1, “,Xn if (a) T is sound, and 
(b) for any predicate template x € A, we have (x, f) € Outputs(7) if and only if 
f is a unique best function for (F, x1, xa, x) for some affine f. 


Definition 5 (Complete sampling oracle). Let F be a construct, A an 
abstract domain, and Rr a probability distribution over DOMAIN(F) with finite 
support S. Futher, for any input predicate templates x1,-:, X. and output predi- 
cate template xo in A admitting a unique best function f, let C(xo,-:, Xn) be the 
set of tuples (co, -:, cn) such that (xo(y, co), Xa(1, 1). Xn (En, Cn)) € Ao XX An 
and co = f(c,^,c4), where Ag x - - x Ay = ABSTRACT(so, Xo, *; Sn, Xn) and 
($1,754) € S and so = [F(si,-,s4)]. The distribution Rr is a complete sam- 
pling oracle if C(Xo, ©, X4) has full rank for all xo, Xa. 


The following theorem states that LEARNTRANSFORMERS is guaranteed to 
synthesize the best transformer if a unique one exists: 


Theorem 2 (Completeness of LEARNTRANSFORMERS). Given an abstract 
domain A and a complete sampling oracle Rp for A, LEARNTRANSFORMERS 
terminates. Further, let T be the set of transformers returned and let v be 
the unique best transformer for DSL construct F and input predicate templates 
X1; Xn € A”. Then we have T € T. 


Using this completeness (modulo unique best transformer) result, we can now 
state the termination guarantees of our LEARNABSTRACTIONS algorithm: 


Theorem 3 (Termination of LEARNABSTRACTIONS). Given a complete 
sampling oracle Rp for every abstract domain and the unique best transformer 
assumption, if there exists a solution for every problem £; € E, then LEARNAB- 
STRACTIONS terminates. 
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7 Implementation and Evaluation 


We have implemented the proposed method as a new tool called ATLAS, which is 
written in Java. ATLAS takes as input a set of training problems, an Abstraction- 
Guided Synthesizer (AGS), and a DSL and returns an abstract domain (in the 
form of predicate templates) and the corresponding transformers. Internally, 
ATLAS uses the Z3 theorem prover [22] to compute tree interpolants and the 
JLinAlg linear algebra library [23] to solve linear equations. 

To assess the usefulness of ATLAS, we conduct an experimental evaluation in 
which our goal is to answer the following two questions: 


1. How does ATLAS perform during training? That is, how many training prob- 
lems does it require and how long does training take? 
2. How useful are the abstractions learned by ATLAS in the context of synthesis? 


7.1 Abstraction Learning 


To answer our first question, we use ATLAS to automatically learn abstractions 
for two application domains: (i) string manipulations and (ii) matrix transforma- 
tions. We provide ATLAS with the DSLs used in [14] and employ BLAZE as the 
underlying Abstraction-Guided Synthesizer. Axiomatic semantics for each DSL 
construct were given in the theory of equality with uninterpreted functions. 


Training Set Information. For the string domain, our training set consists of 
exactly the four problems used as motivating examples in the BlinkFill paper [17]. 
Specifically, each training problem consists of 4-6 examples that demonstrate the 
desired string transformation. For the matrix domain, our training set consists 
of four (randomly selected) synthesis problems taken from online forums. Since 
almost all online posts contain a single input-output example, each training prob- 
lem includes one example illustrating the desired matrix transformation. 


Main Results. Our main results are summarized in Fig. 7. The main take- 
away message is that ATLAS can learn abstractions quite efficiently and does not 
require a large training set. For example, ATLAS learns 5 predicate templates and 
30 abstract transformers for the string domain in a total of 10.2 s. Interestingly, 
ATLAS does not need all the training problems to infer these four predicates and 
converges to the final abstraction after just processing the first training instance. 
Furthermore, for the first training instance, it takes ATLAS 4 iterations in the 
learning loop (lines 5-10 from Fig. 2) before it converges to the final abstraction. 
Since this abstraction is sufficient, it takes just one iteration for each following 
training problem to synthesize a correct program. 

Looking at the right side of Fig.7, we also observe similar results for the 
matrix domain. In particular, ATLAS learns 10 predicate templates and 59 
abstract transformers in a total of 22.5 s. Furthermore, ATLAS converges to the 
final abstract domain after processing the first three problems? and the number 
of iterations for each training instance is also quite small (ranging from 1 to 3). 


? The learned abstractions can be found in the extended version of this paper [21]. 
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|A| [Z| Iters. Running time (sec) |A] IT| Iters. Running time (sec) 
Tacs Ta Tr | Tiotal Tacs TA Tr | Tiotal 
Ey 5 30 4 0.6 0.2 0.2} 1.0 Ey 8 45 3 2.9 0.7 0.5] 4.1 
E2 5 30 1 4.9 0 0 4.9 E2 8 45 1 2.8 0 0 2.8 
E3 5 30 1 0.2 0 0 0.2 E3 10 59 2 05 0.3 0.2] 1.0 
E4 5 30 1 4.1 0 0 4.1 E4 10 59 1 14.6 0 0 14.6 
Total] 5 30 7 9.8 0.2 0.2} 10.2 Total] 10 59 T 20.8 1.0 0.7 | 22.5 


String domain Matrix domain 

Fig. 7. Training results. |A|, |7|, Iters denote the number of predicate templates, 
abstract transformers, and iterations taken per training instance (lines 5-10 from 
Fig. 2), respectively. Tacs, TA, Tr denote the times for invoking the synthesizer (AGS), 
learning the abstract domain, and learning the abstract transformers, respectively. Tiotai 
shows the total training time in seconds. 


Original BLAzE! benchmarks Additional benchmarks All benchmarks 
#Solved Running time #Solved Running time | Time Running time 
improvement improvement | (sec) | improvement 
BLAZE* BLAzE!| max. avg. | BLAZE* BLAZE!| max. avg. avg. max. avg. 
String 93 91 15.7x 2.1x 40 40 56x 22.3x 2.8 56x  8.3x 
Matrix 39 39 6.1x 3.1x 20 19 83x  21.5x 5.0 83x  9.2x 


Fig. 8. Improvement of BLAZE* over BLAZE! on string and matrix benchmarks. 


7.2 Evaluating the Usefulness of Learned Abstractions 


To answer our second question, we integrated the abstractions synthesized by 
ATLAS into the BLAZE meta-synthesizer. In the remainder of this section, we 
refer to all instantiations of BLAZE using the ATLAS-generated abstractions 
as BLAZE”. To assess how useful the automatically generated abstractions are, 
we compare BLAZE* against BLAZE!, which refers to the manually-constructed 
instantiations of BLAZE described in [14]. 


Benchmark Information. For the string domain, our benchmark suite con- 
sists of (1) all 108 string transformation benchmarks that were used to evaluate 
BLAZE! and (2) 40 additional challenging problems that are collected from online 
forums which involve manipulating file paths, URLs, etc. The number of exam- 
ples for each benchmark ranges from 1 to 400, with a median of 7 examples. For 
the matrix domain, our benchmark set includes (1) all 39 matrix transformation 
benchmarks in the BLAZE! benchmark suite and (2) 20 additional challenging 
problems collected from online forums. We emphasize that the set of benchmarks 
used for evaluating BLAZE* are completely disjoint from the set of synthesis prob- 
lems used for training ATLAS. 


422 X. Wang et al. 


Experimental Setup. We evaluate BLAZE* and BLAZE! using the same DSLs 
from the BLAZE paper [14]. For each benchmark, we provide the same set of 
input-output examples to BLAZE* and BLAZE, and use a time limit of 20min 
per synthesis task. 


Main Results. Our main evaluation results are summarized in Fig.8. The 
key observation is that BLAZE* consistently improves upon BLAZE! for both 
string and matrix transformations. In particular, BLAZE* not only solves more 
benchmarks than BLAZE! for both domains, but also achieves about an order 
of magnitude speed-up on average for the common benchmarks that both tools 
can solve. Specifically, for the string domain, BLAZE* solves 133 (out of 148) 
benchmarks within an average of 2.8 s and achieves an average 8.3x speed-up 
over BLAZE!. For the matrix domain, we also observe a very similar result where 
BLAZE* leads to an overall speed-up of 9.2x on average. 

In summary, this experiment confirms that the abstractions discovered by 
ATLAS are indeed useful and that they outperform manually-crafted abstractions 
despite eliminating human effort. 


8 Related Work 


To our knowledge, this paper is the first one to automatically learn abstract 
domains and transformers that are useful for program synthesis. We also believe 
it is the first to apply interpolation to program synthesis, although interpolation 
has been used to synthesize other artifacts such as circuits [24] and strategies 
for infinite games [25]. In what follows, we briefly survey existing work related 
to program synthesis, abstraction learning, and abstract transformer computa- 
tions. 


Program Synthesis. Our work is intended to complement example-guided pro- 
gram synthesis techniques that utilize program abstractions to prune the search 
space [4, 14-16]. For example, SIMPL [15] uses abstract interpretation to speed up 
search-based synthesis and applies this technique to the generation of imperative 
programs for introductory programming assignments. Similarly, SCYTHE [16] and 
MORPHEUS [4] perform enumeration over program sketches and use abstractions 
to reject sketches that do not have any valid completion. Somewhat different 
from these techniques, BLAZE constructs a finite tree automaton that accepts 
all programs whose behavior is consistent with the specification according to the 
DSL's abstract semantics. We believe that the method described in this paper 
can be useful to all such abstraction-guided synthesizers. 


Abstraction Refinement. In verification, as opposed to synthesis, there have 
been many works that use Craig interpolants to refine abstractions [20, 26, 277]. 
Typically, these techniques generalize the interpolants to abstract domains by 
extracting a vocabulary of predicates, but they do not generalize by adding 
parameters to form templates. In our case, this is essential because interpolants 


Learning Abstractions for Program Synthesis 423 


derived from fixed input values are too specific to be directly useful. Moreover, 
we reuse the resulting abstractions for subsequent synthesis problems. In verifi- 
cation, this would be analogous to re-using an abstraction from one property or 
program to the next. It is conceivable that template-based generalization could 
be applied in verification to facilitate such reuse. 


Abstract Transformers. Many verification techniques use logical abstract 
domains [28-32]. Some of these, following Yorsh, et al. [33] use sampling with a 
decision procedure to evaluate the abstract transformer [34]. Interpolation has 
also been used to compile efficient symbolic abstract transformers [35]. However, 
these techniques are restricted to finite domains or domains of finite height to 
allow convergence. Here, we use infinite parameterized domains to obtain better 
generalization; hence, the abstract transformer computation is more challenging. 
Nonetheless, the approach might also be applicable in verification. 


9 Limitations 


While this paper takes a first step towards automatically inferring useful abstrac- 
tions for synthesis, our proposed method has the following limitations: 


Shapes of Transformers. Following prior work [14], our algorithm assumes that 
abstract transformers have the shape given in Eq. 1. We additionally assume 
that constants c used in predicate templates are numeric values and that func- 
tions in Eq. 1 are affine. This assumption holds in several domains considered 
in prior work [4,14] and allows us to develop an efficient learning algorithm that 
reduces the problem to solving a system of linear equations. 


DSL Semantics. Our method requires the DSL designer to provide the DSL’s log- 
ical semantics. We believe that giving logical semantics is much easier than com- 
ing up with useful abstractions, as it does not require insights about the internal 
workings of the synthesizer. Furthermore, our technique could, in principle, also 
work without logical specifications although the learned abstract domain may 
not be as effective (see Footnote 3 in Sect. 4) and the synthesized transformers 
would not be provably sound. 


UBT Assumption. Our completeness and termination theorems are predicated 
on the unique best transformer (UBT) assumption. While this assumption holds 
in our evaluation, it may not hold in general. However, as mentioned in Sect. 6, 
we can always guarantee termination by including the concrete predicates used 
in the interpolant Z in addition to the symbolic templates extracted from Z. 


10 Conclusion 


We proposed a new technique for automatically instantiating abstraction-guided 
synthesis frameworks in new domains. Given a DSL and a few training prob- 
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lems, our method automatically discovers a useful abstract domain and the cor- 
responding transformers for each DSL construct. From a technical perspective, 
our method uses tree interpolation to extract reusable templates from failed 
synthesis attempts and automatically synthesizes unique best transformers if 
they exist. We have incorporated the proposed approach into the BLAZE meta- 
synthesizer and show that the abstractions discovered by ATLAS are very useful. 

While we have applied the proposed technique to program synthesis, we 
believe that some of the ideas introduced here are more broadly applicable. For 
instance, the idea of extracting reusable predicate templates from interpolants 
and synthesizing transformers in a data-driven way could also be useful in the 
context of program verification. 
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Abstract. Symbolic automata (s-FAs) allow transitions to carry predi- 
cates over rich alphabet theories, such as linear arithmetic, and therefore 
extend classic automata to operate over infinite alphabets, such as the 
set of rational numbers. In this paper, we study the problem of the learn- 
ability of symbolic automata. First, we present M AT", a novel L*-style 
algorithm for learning symbolic automata using membership and equiv- 
alence queries, which treats the predicates appearing on transitions as 
their own learnable entities. The main novelty of M AT" is that it can 
take as input an algorithm A for learning predicates in the underlying 
alphabet theory and it uses A to infer the predicates appearing on the 
transitions in the target automaton. Using this idea, M AT" is able to 
learn automata operating over alphabets theories in which predicates are 
efficiently learnable using membership and equivalence queries. Further- 
more, we prove that a necessary condition for efficient learnability of an 
s-FA is that predicates in the underlying algebra are also efficiently learn- 
able using queries and thus settling the learnability of a large class of 
s-FA instances. We implement M AT" in an open-source library and show 
that it can efficiently learn automata that cannot be learned using exist- 
ing algorithms and significantly outperforms existing automata learning 
algorithms over large alphabets. 


1 Introduction 


In 1987, Dana Angluin showed that finite automata can be learned in polynomial 
time using membership and equivalence queries [3]. In this learning model, often 
referred to as a minimally adequate teacher (MAT), the teacher can answer 
(i) whether a given string belongs to the target language being learned and 
(ii) whether a certain automaton is correct and accepts the target language, and 
provide a counterexample if the automaton is incorrect. Following this result, 
her L* algorithm has been studied extensively [16,17], it has been extended to 
several variants of finite automata [4,12,20] and has found many applications in 
program analysis [2,6,7] and program synthesis [25]. 

Recent work [6,11] developed algorithms which can efficiently learn s-FAs 
over certain alphabet theories. These algorithms operate using an underlying 
predicate learning algorithm which can learn partitions of the domain using 
© The Author(s) 2018 
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predicates from counterexamples. While such results give sufficient conditions 
under which s-FAs can be efficiently learned, they do not provide any necessary 
conditions. More precisely, the following question remains open: 


For what alphabet theories can s-FAs be efficiently learned? 


In this paper, we make significant progress towards answering this ques- 
tion by providing new sufficient and necessary conditions for efficiently learning 
symbolic automata. More specifically, we present M AT*, a new algorithm for 
learning s-FAs using membership and equivalence queries. The main novelty of 
M AT* is that it can accept as input a MAT learning algorithm A for predicates 
in the underlying alphabet theory. Afterwards, M AT* spawns instances of A to 
infer each transition in the target s-FA and efficiently answers membership and 
equivalence queries performed by A using the s-FA membership and equivalence 
oracles. The predicate learning algorithms do not need to learn entire partitions 
but individual predicates and therefore, M AT* greatly simplifies the design of 
learning algorithms for s-FAs by allowing one to reuse existing learning algo- 
rithms for the underlying alphabet theory. Moreover, M AT* allows the under- 
lying predicate learning algorithms to perform both membership and equivalence 
queries, thus extending the class of efficiently learnable s-FAs to MAT-learnable 
alphabet theories—e.g., bit-vector predicates expressed as BDDs. 

Furthermore, we show that a necessary condition for efficiently learning a 
symbolic automaton over a Boolean algebra is that the individual predicates in 
the algebra also have to be efficiently learnable. Moreover, we provide a charac- 
terization of the instances which are not efficiently learnable by our algorithm 
and conjecture that such instances are not learnable by any efficient algorithm. 

We implement M AT* in the open-source symbolicautomata library [1] and 
evaluate it on 15 regular-expression benchmarks, 1,500 s-FA benchmarks over 
bit-vector alphabets, and 18 synthetic benchmarks over infinite alphabets. Our 
results show that M AT* can efficiently learn automata over different alphabet 
theories, some of which cannot be learned using existing algorithms. Moreover, 
for large finite alphabets, M AT* significantly outperforms existing automata 
learning algorithms. 


Contributions. In summary, our contributions are: 


— MAT*, the first algorithm for learning symbolic automata that operate over 
MAT-learnable alphabet theories—i.e., in which predicates can be learned 
using only membership and equivalence queries (Sect. 3). 

— A soundness result for M AT" and new necessary and sufficient conditions 
for the learnability of symbolic automata. Moreover, a characterization of the 
remaining class for which the learnability is not settled (Sect. 4). 

— A modular implementation of M AT* in an existing open-source library 
together with a comprehensive evaluation on existing and new automata- 
learning benchmarks (Sect. 6). 
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2 Background 


2.1 Boolean Algebras and Symbolic Automata 


In symbolic automata, transitions carry predicates over a decidable Boolean 
algebra. An effective Boolean algebra A is a tuple (D,¥,[_], L, T, V, ^, 7) where 
D is a set of domain elements; V is a set of predicates closed under the Boolean 
connectives, with L, T € Y; |] : Vw — 2? is a denotation function such that 
(i) [L] = 9, Gi) [T] = 9, and (üi) for all yw € v, [ev v] = [e] U[v]. 
[o ^ v] ^ [e] n [v], and [^e] = 9 Ne]. 


Example 1 (Equality Algebra). The equality algebra for an arbitrary set D has 
predicates formed from Boolean combinations of formulas of the form Ac. c = a 
where a € D. Formally, V is generated from the Boolean closure of Vy = {Ya | 
a € D}U{L, T) where for all a € D, [pa] = (a). Examples of predicates in this 
algebra include Ac. c = 5 V c = 10 and Ac. (c = 0). 


Definition 1 (Symbolic Finite Automata). A symbolic finite automaton 
(s-FA) M is a tuple (A, Q, qiu, F, A) where A is an effective Boolean algebra, 
called the alphabet; Q is a finite set of states; qini € Q is the initial state; 
F C Q is the set of final states; and A C Q x V4 x Q is the transition relation 
consisting of a finite set of moves or transitions. 


Characters are elements of D4, and words or strings are finite sequences of 
characters, or elements of D% . The empty word of length 0 is denoted by e. A 
move p = (q1,9,d2) € A, also denoted by qı Æ q2, is a transition from the 
source state qı to the target state qo, where y is the guard or predicate of the 
move. For a state q € Q, we denote by guard(q) the set of guards for all moves 
from q. For a character a € D4, an a-move of M, denoted q1 5 q2 is a move 
qi S q2 such that a € [vv]. 

An s-FA M is deterministic if, for all transitions (q, %1,q1), (q, 2,92) € A, 
d # Q > [pı ^2] = Ø—i.e., for each state q and character a there is 
at most one a-move out of q. An s-FA M is complete if, for all q € Q, 
IN toes yi] = D—i-e., for each state q and character a there exists an a- 
move out of q. Throughout the paper we assume all s-FAs are deterministic and 
complete, since determinization and completion are always possible [10]. Given 
an s-FA M = (A, Q, dinit, F, A) and a state q € Q, we say a word w = a1a5 aj 
is accepted at state q if, for 1 < i € k, there exist moves qj_1 —» qi such that 
dinit = d and qy € F. 

For a deterministic s-FA M and a word w, we denote by Mj[w] the state 
reached in M by w when starting at state g. When q is omitted we assume 
that execution starts at qi. For a word w = ay1---az, we use w[i.] = 
a,°++ ak, w|..i] = a1--- ai, w[i] = a; to denote the suffix starting from the i-th 
position, the prefix up to the i-th position and the character at the i-th position 
respectively. We use B = (T,F) to denote the Boolean domain. A word w is 
called an access string for state q € Q if M[w] = q. For two states q,p c Q, a 
word w is called a distinguishing string, if exactly one of M,[w] and M,[w] is 
final. 
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2.2 Learning Model 


In this paper, we follow the notation from [17]. A concept is a Boolean function 
c: D — B. A concept class C is a set of concepts which is represented using 
representation class 7€. By representation class we denote a fixed function from 
strings to concepts in C. For example, regular expressions, DFAs and NFAs are 
different representation classes for the concept class of regular languages. 

'The learning model under which all learning algorithms in this paper operate 
is called exact learning from membership and equivalence queries or learning 
using a Minimal Adequate Teacher (MAT), and was originally introduced by 
Angluin [3]. In this model, to learn an unknown concept c € C, a learning 
algorithm has access to two types of queries: 


Membership Query: In a membership query Ó(z), the input is x € D and 
the query returns the value c(x) of the concept on given input z—i.e., T if x 
belongs to the concept and F otherwise. 

Equivalence Query: In an equivalence query €(H), the input given is a 
hypothesis (or model) H. The query returns T if for every z € D, H(x) = 
c(x). Otherwise, an input w € D is returned such that H(w) 4 c(w). 


An algorithm is a learning algorithm for a concept class C if, for any c € C, the 
algorithm terminates with a correct model for c after making a finite number of 
membership and equivalence queries. In this paper, we will say that a learning 
algorithm is efficient for a concept class C if it learns any concept c € C using 
a polynomial number of queries on the size of the representation of the target 
concept in R and the length of the longest counterexample provided to the 
algorithm. 

An effective Boolean algebra A = (D, V,[ - ], L, T, V, ^, =) naturally defines 
the concept class 2? with representations in W of predicates over the domain 9. 
We will say that an algorithm is a learning algorithm for the algebra .A to denote 
a learning algorithm that can efficiently learn predicates from the representation 
class V. 


3 The MAT* Algorithm 


Our learning algorithm, M AT*, can be viewed as a symbolic version of the 
TTT algorithm for learning DFAs [16], but without discriminator finalization. 
The learning algorithm accepts as input a membership oracle O, an equivalence 
oracle € as well as a learning algorithm A for the underlying Boolean algebra 
used in the target s-FA M. The algorithm uses a classification tree [17] to gen- 
erate a partition of D* into equivalence classes which represent the states in the 
target s-FA. Once a tree is obtained, we can use it to determine, for any word 
w € D*, the state accessed by w in M—i.e., what state the automaton reaches 
when reading the word w. Then, we build an s-FA model H, using the algebra 
learning algorithm A to create models for each transition guard and utilizing 
the classification tree in order to implement a membership oracle for A. Once a 
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Algorithm 1. s-FA-LEARN(O,€, A) // s-FA Learning algorithm 
Require: O: membership oracle, €: equivalence oracle, A: algebra learning algorithm. 
T — InitializeClassificationTree(O) 
S4 — InitializeGuardLearners(T, A) 
H — GetSFAModel(T, Sa, O) 
while €(H) Æ T do 
w — GetCounterexample(71) 
T, S4 — ProcessCounterexample(T, $4, w, O) 
H — GetSFAModel(T, S4, O) 


return H 


model is generated, we check for equivalence and, given a counterexample, we 
either update the classification tree with a new state and a corresponding distin- 
guishing string, or propagate the counterexample into one of the instances of the 
algebra learning algorithm A. The structure of M AT* is shown in Algorithm 1. 
In the rest of the section, we use the s-FA in Fig. 1 as a running example for our 
algorithm. 


3.1 The Classification Tree 


The main data structure used by our learning 
algorithm is the classification tree (CT) [17]. The 
classification tree is a tree data structure used to 
store the access and distinguishing strings for the 
target s-FA so that all internal nodes of the tree 
are labelled using a distinguishing string while 
all leafs are labeled using access strings. 


Fig.1. An s-FA over equality 
algebra. 


Definition 2. A classification tree T — (V, L, E) is a binary tree such that: 


- V C X* is the set of nodes. 

- LCV is the set of leafs. 

- ECV xV xB is the transition relation. For (v,u,b) € E, we say that v is 
the parent of u and furthermore, if b= T (resp. b = F) we say that u is the 
T-child (resp. F-child). 


Intuitively, given any internal node v € V, any leaf lr reached by following the 
T-child of v can be distinguished from any leaf lp reached by the F-child using 
v. In other words, the membership queries for [rv and [rv produce different 
results—i.e., O(lrv) # O(lpv). 


Tree Initialization. 'To initialize the CT data structure, we use a membership 
query on the empty word e. Then, we create a CT with two nodes, a root node 
labeled with e and one child also labeled with e. The child of the root is either 
a T-child or F-child, according to the result of the O(e) query. 


The sift Operation. The main operation performed using the classification tree 
is an operation called sift which allows one to determine, for any input word s, 
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Learned (8) Classification (b) Failed Completeness check 
States Tree F T 


(c) Failed Determinism check 
T (c — a") 


— 7 


Fig. 2. (left) Classification tree and corresponding learned states for our running exam- 
ple. (right) Two different instances of failed partition verification checks that occured 
during learning and their respective updates on the given counterexamples (CE). 


the state reached by s in the target s-FA. The sift(s) operation performs the 
following steps: 


1. Set the current node to be the root node of the tree and let w be the label at 
the root. Perform a membership query on the word sw. 

2. Let b = O(sw). Select the b-child of the current node and repeat step 2 until 
a leaf is reached. 

3. Once a leaf is reached, return the access string with which the leaf is labelled. 


Note that, until both children of the root node are added, we will have inputs 
that may not end up in any leaf node. In these cases our sift operation will 
return | and M AT* will add the queried input as a new leaf in the tree. 

Once a classification tree is obtained, we use it to simulate a membership 
oracle for the underlying algebra learning algorithm A. This oracle is then used 
to infer models for the transitions and eventually construct an s-FA model. In 
Fig.2 we show the classification tree and the corresponding states learned by 
the M AT* algorithm during the execution on our running example from Fig. 1. 


3.2 Building an s-FA Model 


Assume we are given a classification tree T = (V, L, E). Our next task is to 
use the tree along with the underlying algebra learning algorithm A to pro- 
duce an s-FA model. The main idea is to spawn an instance of the A algo- 
rithm for each potential transition and then use the classification tree to answer 
membership queries posed by each A instance. Initially, we define an s-FA 
H = (A, Qu, qe, Fu, An), where Qu = {qs | s € L]—ie. we create one state 
for each leaf of the classification tree T. Finally, for any q € Qu, we have that 
q € F if and only if O(g) = T. Next, we will show how to build the transition 
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relation for H. As mentioned above, our construction is based on the idea of 
spawning instances of A for each potential transition of the s-FA and then using 
the classification tree to decide, for each character, if the character satisfies the 
guard of the potential transition thus answering membership queries performed 
by the underlying algebra learner. 


Guard Inference. To infer the set of guards in the transition relation Aj, we 
spawn, for each pair of states (qu, qu) € Qu x Qu, an instance A49») of the 
algebra learning algorithm. We answer membership queries to A(?«:?») as follows. 
Let a € D be a symbol queried by A(4?92, Then, we return T as the answer 
to O(a) if sift(uo) = v and F otherwise. Once A(%-%) submits an equivalence 
query E(¢) using a model ¢, we suspend the execution of the algorithm and add 
the transition (qu, 9, qv) in Ay. 


Partition Verification. Once all algebra learners have submitted a model through 
an equivalence query, we have a complete transition relation As. However, at 
this point there is no guarantee that for each state q the outgoing transitions 
from q form a partition of the domain D. Therefore, it may be the case that our 
s-FA model # is in fact non-deterministic and, moreover, that certain symbols do 
not satisfy any guard. Using such a model in an equivalence query would result 
in an improper learning algorithm and potential problems in the counterexample 
processing algorithm in Sect. 3.3. To mitigate this issue we perform the following 
checks: 


Determinism check: For each state qs € Qs and each pair of moves 
(ds, $1. du), (Gs, 92, qv) € An, we verify that [G1 ^ $9] = 0. Assume that a 
character a is found such that o € [61 ^ $5] and let m = sift(so). Then, it 
must be the case that the guard of the transition qs — qm must satisfy a. 
Therefore, we check if m = u and m = v and provide o as a counterexample 
to Al) and A (199) respectively if the corresponding check fails. 

Completeness check: For each state qu € Qu let S = (9 | (q,0,p) € An}. 
We check that [Ves 9] = D. If a symbol h ¢ [Vses P] is found then, let 
v = sift(uh). Following the same reasoning as above, we provide h as a 
counterexample to A(@-%), 


These checks are iterated for each state until no more counterexamples are found. 
In Fig. 2 we demonstrate instances of failed determinism and completeness checks 
while learning our running example from Fig. 1 along with the corresponding 
updates on the predicates. For details regarding the equality algebra learner, see 
Sect. 5. 


Optimizing the Number of Algebra Learning Instances. Note that in the descrip- 
tion above, M AT* spawns one instance of A for each possible transition between 
states in H. To reduce the number of spawned algebra learning instances, we 
perform the following optimization: For each state q, we initially spawn a single 
algebra learning instance A(4*?, Let a be the first symbol queried by A(s?) and 
let u = sift(sa). We return T as a query answer for a to A4»? and set the 
target state for the instance to qu, i.e. we convert the algebra learning instance 
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to AQ), Afterwards, we keep a set R = {qy | v = sift(s@)} for all 8 € D 
queried by the different algebra learning instances and generate new instances 
only for states q, € R for which the guards are not yet inferred. Using this opti- 
mization, the total number of generated algebra learning instances never exceeds 
the number of transitions in the target s-FA. 


3.8 Processing Counterexamples 


For counterexample processing, we adapt the algorithm used in [6] in the setting 
of MAT* . In a nutshell, our algorithm works similarly to the classic Rivest- 
Schapire algorithm [23] and the TTT algorithm [16] for learning DFAs, where 
a binary search is performed to locate the index in the counterexample where 
the executions of the model automaton and the target one diverge. However, 
once this breakpoint index is found, our algorithm performs further analysis to 
determine if the divergence is caused by an undiscovered state in our model 
automaton or because the guard predicate that consumes the breakpoint index 
character is incorrect. 


Error Localization. Let w be a counterexample for a model H generated as 
described above. For each index i € [0..|w|]], let qu = 7t|w[..i] be the state 
accessed by w[.i] in H and let y; = ww[i + 1..]. In other words, y; is obtained 
by first running w in H for i steps and then, concatenating the access string 
for the state reached in H with the word w[i + 1..]. Note that, because initially 
the model H and the target s-FA start at the same state accessed by e, the two 
machines are synchronized and therefore, O(zo) = O(w). Moreover, since w is a 
counterexample, we have that O(^j,4) # O(w). It follows that, there exists an 
index j, which we will refer to as breakpoint, for which O(y;) # O(7;+1). The 
counterexample processing algorithm uses a binary search on the index j to find 
such a breakpoint. For more information on the correctness of this method we 
refer the reader to [6,23]. 


Breakpoint Analysis. Once we find an index j such that O(y;) 4 O(vyj,1) we can 
conclude that the transition taken in H from H[w]..j]] with the symbol w[j+ 1] is 
incorrect. In traditional algorithms for learning DFAs, the sole reason for having 
an incorrect transition would be that the transition is actually directed to a yet 
undiscovered state in the target automaton. However, in the symbolic setting 
we have to explore two different possibilities. Let q, = ?7[w|[..j] be the state 
accessed in H by w]..j], qy = sift(uw[j + 1]) be the result of sifting uw[j + 1] in 
the classification tree and consider the transition (qu, 9, qu) € Ayn. We use the 
guard ó to determine if the counterexample was caused by an invalid predicate 
guard or an undiscovered state in the target s-FA. 


Case 1. Incorrect guard. Assume that w[j + 1] ¢[¢]. Note that, o was generated 
as a model by A(4«4») and therefore, a membership query from A44») for a 
character a returns T if sift(uo) = v. Moreover, we have that sift(uw|j + 
1]) = v. Therefore, if w[j + 1] ¢[¢], then w[j + 1] is a counterexample for the 
learning instance A44») which produced ¢. We proceed to supply A(@) with 
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Fig. 3. (left) A minimal s-FA. (right) The s-FA corresponding to the classification tree 
of M AT* with access strings for ginit and qo and a single distinguishing string c. 


the counterexample w[j + 1], update the corresponding guard and continue to 
generate a new s-FA model. 


Case 2. Undiscovered state. Assume w|j + 1] € [9]. It follows that ¢ is behaving 
as expected on the symbol w[j + 1] based on the current classification tree. We 
conclude that the state accessed by w[..j +1] is in fact an undiscovered state 
in the target s-FA which we have to distinguish from the previously discovered 
states. Therefore, we proceed to add a new leaf in the tree to access this state. 
More specifically, we replace the leaf labelled with v with a sub-tree consisting 
of three nodes: the root is the word w[j + 1..], which is the distinguishing string 
for the states accessed by v and ww|j + 1]. The T-child and F-child of this 
node are labelled with the words v and uwlj] based on the results of O(v) and 
O(uw|j + 1]). 

Finally, we have to take care of one last point: Once we add another state in 
the classification tree, certain queries that were previously directed to v may be 
directed to ww[j] once we sift them down in the tree. This change implies that 
certain previous queries performed by algebra learning instances A(4*:44) may be 
given invalid results and therefore, we can no longer guarantee correctness of the 
generated predicates. To solve this problem, we terminate all instances A(%-%) 
for all qs € Qu and replace them with fresh instances of the algebra learning 
algorithm. 


4 Correctness and Completeness of M AT* 


Given a learning algorithm A, we use C4(n) to denote the number of membership 
queries and C4(n) to denote the number of equivalence queries performed by A 
for a target concept with representation size n. In our analysis we will also use 
the following definitions: 


Definition 3. Let M = (A, Q, qo, F, A) over a Boolean algebra A and let S C 
Wa. Then, we define: 


- The maximum size of the union of predicates in S as U(S) = 


maxocs |V oes 6|. 
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- The maximum guard union size for M as B(M) 7 maxgeq U(guard(q)). 


The value B(M) denotes the maximum size that a predicate guard may 
take in any intermediate hypothesis produced by MAT™* during the learning 
process. Contrary to traditional L*-style algorithms, the size of the intermediate 
hypothesis produced by M AT* may fluctuate as we demonstrate in the following 
example. 


Example 2. Consider the s-FA in the left side of Fig.3. When we execute the 
M AT* algorithm in this s-FA, and after an access string for q2 is added to the 
classification tree, the tree will correspond to the s-FA shown on the right, in 
which the transition from gjnit is taken over the union of the individual transitions 
in the target. Certain sequences of answers to equivalence queries can force 
M AT* to first learn a correct model of $4 V $2 V $3 before revealing a new state 
in the target s-FA. 


We now state the correctness and query complexity of our algorithm. 


Theorem 1. Let M = (.A, Q, qo, F, A) be an s-FA, A be a learning algorithm A 
and let k = B(M). Then, M AT* will learn M using A with O(|Q|?| AICA (k) + 
|Q|?|A|C4(k) log m) membership and O(|Q||A|C4(k)) equivalence queries, where 
m is the length of the longest counterexample given to M AT*. 


Proof. First, we note that our counterexample processing algorithm only splits 
a leaf if there exists a valid distinguishing condition separating the two newly 
generated leafs. Therefore, the number of leafs in the discrimination tree is always 
at most |Q|. Next, note that each counterexample is processed using a binary 
search with complexity O(log m) to detect the breakpoint and, afterwards, either 
a new state is added or a counterexample is dispatched to the corresponding 
algebra learner. 

Each classification tree T = (V, L, E) defines a partition over D* and, there- 
fore, an s-FA Hr. In the worst case, M AT* will learn Hr exactly before a new 
state in the target s-FA is revealed through an equivalence query. Since Hr is 
the result of merging states in the target s-FA, we conclude that the size of each 
predicate in Hr is at most k. It follows that, for each classification tree T', we can 
get at most | Ax, |C2 (k) counterexamples until a new state is uncovered on the 
target s-FA. Note here, that our counterexample processing algorithm ensures 
that each counterexample will be either a valid counterexample for a predicate 
guard in Hr or it will uncover a new state. For each membership query performed 
by an underlying algebra learner, we have to sift a string in the classification 
tree which requires at most |Q| membership queries. Therefore, the total num- 
ber of membership queries performed for each candidate model H is bounded by 
O(|A[(IQICZ (Kk) 4-C2 (k) log m) where m is the size of the longest counterexample 
so far. The number of equivalence queries is bounded by O(|A|C4(k)). When a 
new state is uncovered, we assume that, in the worst case, all the algebra learners 
will be restarted (this is an overestimation) and therefore, the same process will 
be repeated at most |Q| times giving us the stated bounds. 
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Note that the bounds on the number of queries stated in Theorem 1 are based 
on the worst-case assumption that we may have to restart all guard learning 
instances each time we discover a new state. In practice, we expect these bounds 
to be closer O(|A|CA(k) + (LA|C2 (Kk) 4- |Q]) log m) membership and O(|A|C4(k)+ 
|Q|) equivalence queries. 


Minimality of Learned s-FA. Since the M AT* will only add a new state in 
the s-FA if a distinguishing sequence is found it follows that the total number 
of states in the s-FA is minimal. Moreover, M AT* will not modify in any way 
the predicates returned by the underlying algebra learning instances. Therefore, 
if the size of the predicates returned by the A instances is minimal, M AT* will 
maintain their minimality. 

The following theorem shows that it is indeed not possible to learn s-FAs 
over a Boolean algebra that is not itself learnable. 


Theorem 2. Let A*^ be an efficient learning algorithm for the algebra of s- 
FAs over a Boolean algebra A. Then, the Boolean algebra A is efficiently learn- 
able. 


Which s-FAs Are Efficiently Learnable? Theorem 2 shows that efficient 
learnability of an s-FA requires efficient learnability of the underlying algebra. 
Moreover, from Theorem 1 it follows that efficiently learnability using M AT* 
depends on the following property of the underlying algebra: 


Corollary 1. Let A be an efficiently learnable Boolean algebra and consider the 
class Ra of s-FAs over A. Then, RA is efficiently learnable using M AT* 
if and only if, for any set S C Wa such that for any distinct 9, E€ S => 
[dA v] 2 0, we have that U(S) = poly(|S|,maxges ||). 


At this point we would like to point out that the above condition arises due to the 
fact that M AT* is a congruence-based algorithm which successively computes 
hypothesis automata based on refining a set of access and distinguishing strings 
which is a common characteristic among all L*-based algorithms. Therefore, 
this limitation of M AT" is expected to be shared by any other algorithm in 
the same family. Given the fact that after three decades of research, L*-based 
algorithms are the only known, provably efficient algorithms for learning DFAs 
(and subsequently s-FAs), we expect that expanding the class of learnable s-FAs 
is a very challenging task. 


5  Learnable Boolean Algebras 


We will now describe a number of interesting effective Boolean algebras which 
are efficiently learnable using membership and equivalence queries. 


Boolean Algebras Over Finite Domains. Let .A be any Boolean Algebra over a 
finite domain D. Then, any predicate ¢ € V can be learned using |9| member- 
ship queries. More specifically, the learning algorithm constructs a predicate ¢ 


438 G. Argyros and L. D’Antoni 


accepting all elements in © for which the membership queries return true as 
@={c|cE DA O(c) = T}. Plugging this algebra learning algorithm into our 
algorithm, we get the TTT learning algorithm for DFAs without discriminator 
finalization [16]. This simple example demonstrates that algorithms for DFAs 
can be viewed as special cases of our s-FA learning algorithm for finite domains. 


Equality Algebra. Consider the equality algebra defined in Example 1. Predicates 
in this algebra of size |ó| = k can be learned using 2k equivalence queries and 
no membership queries. Initially, the algorithm outputs the empty set L as a 
hypothesis. In any subsequent step, the algorithm keeps a list of the counterex- 
amples obtained so far in two sets P, N C Ð such that P holds all the positive 
examples received so far and N holds all the negative examples. Afterwards, 
the algorithm finds the smallest hypothesis consistent with the counterexamples 
given. This hypothesis can be found efficiently as follows: 


1. If |P| > |N| then, ọ = Ac. WV gen c = d). 
2. If |P| < |N] then, $ = Ac.(Vacp c = d). 


It can be easily shown that the algorithm will find a correct hypothesis after at 
most 2k equivalence queries. 


Other Algebras. The following Boolean algebras can be efficiently learned using 
membership and equivalence queries. All these algebras also have approximate 
fingerprints [3], which means that they are not learnable by equivalence queries 
alone. Thus, s-FAs over these algebras are not efficiently learnable by previous 
s-FA learning algorithms [6,11]. 


BDD algebra. The algebra of ordered binary decision diagrams (OBDDs) is 
efficiently learnable using a variant of the L* algorithm [22]. 

Tree automata algebra. Deterministic finite tree automata form an algebra 
which is also learnable using membership and equivalence queries [13]. 

s-FA algebra. s-FAs themselves form an effective Boolean algebra and there- 
fore, s-FAs over s-FAs over learnable algebras are also learnable. 


6 Evaluation 


We have implemented M AT* in the open-source symbolicautomata library [1], 
as well as the learning algorithms for boolean algebras over finite domains, equal- 
ity algebras and BDD algebras as discussed in Sect. 5. Our implementation is 
fully modular: Once an algebra learning algorithm is defined in our library, it can 
be seamlessly plugged in as a guard learning algorithm for s-FAs. Since M AT* 
is also an algebra learning algorithm, this allows us to easily learn automata 
over automata. All experiments were ran in a Macbook air with an 1.8 GHz 
Intel Core i5 and 8 GiB of memory. The goal of our evaluation is to answer the 
following research questions: 


Q1: How does M AT* perform on automata over large finite alphabets? (Sub- 
sect. 6.1) 
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Table 1. Evaluation of M AT" on regular expressions. 


ID ||Q| ||A| | Memb Equiv|R-CE GU D-CE |C-CE 


RE.1 11, 35 653 17 19 25 106 78 
RE.2 | 24 | 113 7203 66 45 87 565 479 
RE.3 11 | 15 483 11 16 16 59 45 
RE.4 18 | 40 1745 17 33 32 188 164 
RE.5 | 25 | 55 3180 22 48 45 244 211 


RE.6 | 52 |155 43737 | 588 |104 640 3102 2953 
RE.7 |179 |658 66477 |1486 91 1398 7748 6540 
RE.8 | 115 |175 | 929261 | 299 | 206 390 | 28606 | 28354 
RE.9 |144 369 | 844213 | 699 | 261 817 | 30485 | 30135 
RE.10 |175 |551 |3228102 5346 | 286 5457 | 172180 | 170483 
RE.11| 6 9 3409 | 281 14 289 723 710 
RE.12| 10 | 14 1367 88 8 86 314 291 
RE.13| 29 | 46 20903 | 743 49 164 2637 2550 
RE.14| 8 | 13 5949 | 365 24 381 854 836 
RE.15| 8 | 15 661 82 2 76 228 198 


Q2: How does M AT* perform on automata over algebras that require both 
membership and equivalence queries? (Subsect. 6.2) 

Q3: How does the size of predicates affect the performance of M AT*? (Sub- 
sect. 6.3) 


6.1 Equality Algebra Learning 


In this experiment, we use M AT* to learn s-FAs obtained from 15 regular expres- 
sions drawn from 3 domains: (1) Regular expressions used in web application san- 
itization frameworks such as in the Codelgniter framework, (2) Regular expres- 
sions drawn from popular web application firewall ModSecurity and finally (3) 
Regular expressions from [18]. For this set of experiments we utilize as alphabet 
the entire UTF-16 (216 characters) and used the equality algebra to represent 
predicates. Since the alphabet is finite, we also tried learning the same automata 
using TTT [16], the most efficient algorithm for learning finite automata over 
finite alphabets. 


Results. Table 1 presents the results of M AT*. The Memb and Equiv columns 
present the number of distinct membership and equivalence queries respectively. 
'The R-CE column shows how many times a counterexample was reused, while 
the GU column shows the number of counterexamples that were used to update 
an underlying predicate (as opposed to adding a new state in the s-FA). Finally, 
D-CE shows the number of counterexamples provided to an underlying alge- 
bra learner due to failed determinism checks, while C- CE shows the number of 
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counterexamples due to failed completeness checks. Note that these counterex- 
amples did not require invoking the equivalence oracle. 

Given the large alphabet sizes, TTT runs out of memory on all our bench- 
marks. This is not surprising since the number of queries required by TTT 
just to construct the correct model for a DFA with 128 = 2" states is at least 
IIIQ] log |Q| = 216 x 27 « 7 ~ 27°. We point out that a corresponding lower 
bound of 2(|Q| log |Q|||) exists for the number of queries any DFA algorithm 
may perform and therefore, the size of the alphabet provides a fundamental 
limitation for any such algorithm. 


Analysis. First, we observe that the performance of the algorithm is not always 
monotone in the number of states or transitions of the s-FA. For example, RE.10 
requires more than 10x more membership and equivalence queries than RE.7 
despite the fact that both the number of states and transitions of RE.10 are 
smaller. In this case, RE.10 has fewer transitions, but they contain predicates 
that are harder to learn—e.g., large character classes. Second, the completeness 
check and the corresponding counterexamples are not only useful to ensure that 
the generated guards form a partition but also to restore predicates after new 
states are discovered. Recall that, once we discover (split) a new state, a number 
of learning instances is discarded. Usually, the newly created learning instances 
will simply output L as the initial hypothesis. At this point, completeness coun- 
terexamples are used to update the newly created hypothesis accordingly and 
thus save the M AT* from having to rerun a large number of equivalence queries. 
Finally, we point out that the equality algebra learner made no special assump- 
tions on the structure of the predicates such as recognizing character classes 
which are used in regular expressions and others. We expect that providing such 
heuristics can greatly improve the performance M AT* in these benchmarks. 


6.2 BDD Algebra Learning 


In this experiment, we use M AT* to learn s-FAs over a BDD algebra. We run 
M AT* on 1,500 automata obtained by transforming Linear Temporal Logic 
over finite traces into s-FAs [9]. The formulas have 4 atomic propositions and 
the height in each BDD used by the s-FAs is four. To learn the underlying BDDs 
we use M AT* with the learning algorithm for algebras over finite domains (see 
Sect. 5) since ordered BDDs can be seen as s-FAs over D = {0,1}. 

Figure4 shows the number of membership (top left) and equivalence (top 
right) queries performed by M AT* for s-FAs with different number of states. 
For this s-FAs, M AT* is highly efficient with respect to both the number of 
membership and equivalence queries, scaling linearly with the number of states. 
Moreover, we note that the size of the set of transitions |A| does not drastically 
affect the overall performance of the algorithm. This is in agreement with the 
results presented in the previous section, where we argued that the difficulty of 
the underlying predicates and not their number is the primary factor affecting 
performance. 
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Fig. 4. (Top) Evaluation of M AT* on s-FAs over a BDD algebra. (Bottom) Evaluation 
of M AT* on s-FAs over an s-FA algebra. For an s-FA Mm,n, the z-axis denotes the 
values of n. Different lines correspond to different values of m. 


6.3 s-FA Algebra Learning 


In this experiment, we use M AT" to learn 18 s-FAs over s-FAs, which accept 
strings of strings. We evaluate the scalability of our algorithms when the diffi- 
culty of learning the underlying predicates increases. The possible internal s-FAs, 
which we will use as predicates, operate over the equality algebra and are denoted 
as I, (where 2 € k < 17). Each s-FA I; accepts exactly one word a---a of length 
k and has k + 1 states and 2k + 1 transitions. The external s-FAs are denoted 
as Mm,n (where m € {5,10,15} and 2 € n < 17). Each s-FA Mm,n accepts 
exactly one word s---s of length m where each s is accepted by Iņ. 


Analysis. For simplicity, let's assume that we have the s-FA Mn,n. Consider a 
membership query performed by one of the underlying algebra learning instances. 
Answering the membership query requires sifting a sequence in the classification 
tree of height at most n which requires O(n) membership queries. Therefore, 
the number of membership queries required to learn each individual predicate is 
increased by a factor of O(n). Moreover, for each equivalence query performed 
by an algebra learning instance, the s-FA learning algorithm has to pinpoint 
the counterexample to the specific algebra learning instance, a process which 
requires log m membership queries, where m is the length of the counterexample. 
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Therefore, we conclude that each underlying guard with n states will require a 
number of membership queries which is of the order of O(n?) at the worst and 
O(n? log n) queries at the best (since the CT has height (log n)), ignoring the 
queries required for counterexample processing. 

Figure4 shows the number of membership (bottom left) and equivalence 
(bottom right) queries, which verify the theoretical analysis presented in the 
previous paragraph. Indeed, we see that in terms of membership queries, we 
have a very sharp increase in the number of membership queries which is in 
fact about quadratic in the number of states in the underlying guards. On the 
other hand, equivalence queries are not affected so drastically, and only increase 
linearly. 


7 Related Work 


Learning Finite Automata. The L* algorithm proposed by Dana Angluin [3] was 
the first to introduce the notion of minimally adequate teacher—i.e., learning 
using membership and equivalence queries—and was also the first for learning 
finite automata in polynomial time. Following Angluin’s result, L* has been 
studied extensively [16,17], it has been extended to many other models—e.g., to 
nondeterministic automata [12] alternating automata [4]—and has found many 
applications in program analysis [2,5-7,24] and program synthesis [25]. Since 
finite automata only operate over finite alphabets, all the automata that can be 
learned using variants of L*, can also be learned using M AT*. 


Learning Symbolic Automata. The problem of scaling L* to large alphabets was 
initially studied outside the setting of s-FAs using alphabet abstractions [14,15]. 
The first algorithm for symbolic automata over ordered alphabets was proposed 
in [20] but the algorithm assumes that the counterexamples provided to the 
learning algorithm are of minimal length. Argyros et al. [6] proposed the first 
algorithm for learning symbolic automata in the standard MAT model and also 
described the algorithm to distinguish counterexamples leading to new states 
from counterexamples due to invalid predicates which we adapt in MAT* . 
Drews and D'Antoni [11] proposed a symbolic extension to the L*algorithm, 
gave a general definition of learnability and demonstrated more learnable alge- 
bras such as union and product algebras. The algorithms in [6,11,19] are all 
extensions of L* and assume the existence of an underlying learning algorithm 
capable of learning partitions of the domain from counterexamples. M AT* does 
not require that the predicate learning algorithms are able to learn partitions, 
thus allowing to easily plug existing learning algorithms for Boolean algebras. 
Moreover, M AT* allows the underlying algebra learning algorithms to perform 
both equivalence and membership queries, a capability not present in any pre- 
vious work, thus expanding the class of s-FAs which can be efficiently learned. 


Learning Other Models. Argyros et al. [6] and Botincan et al. [7] presented algo- 
rithms for learning restricted families of symbolic transducers—i.e., symbolic 
automata with outputs. Other algorithms can learn nominal [21] and register 
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automata [8]. In these models, the alphabet is infinite but not structured (i.e., 
it does not form a Boolean algebra) and characters at different positions can be 
compared using binary relations. 
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Abstract. We address the problem of analyzing the reachable set of a 
polynomial nonlinear continuous system by over-approximating the flow- 
pipe of its dynamics. The common approach to tackle this problem is to 
perform a numerical integration over a given time horizon based on Tay- 
lor expansion and interval arithmetic. However, this method results to be 
very conservative when there is a large difference in speed between trajec- 
tories as time progresses. In this paper, we propose to use combinations 
of barrier functions, which we call piecewise barrier tube (PBT), to over- 
approximate flowpipe. The basic idea of PBT is that for each segment of 
a flowpipe, a coarse box which is big enough to contain the segment is 
constructed using sampled simulation and then in the box we compute 
by linear programming a set of barrier functions (called barrier tube or 
BT for short) which work together to form a tube surrounding the flow- 
pipe. The benefit of using PBT is that (1) BT is independent of time and 
hence can avoid being stretched and deformed by time; and (2) a small 
number of BTs can form a tight over-approximation for the flowpipe, 
which means that the computation required to decide whether the BTs 
intersect the unsafe set can be reduced significantly. We implemented a 
prototype called PBTS in C++. Experiments on some benchmark sys- 
tems show that our approach is effective. 


1 Introduction 


Hybrid systems [17] are widely used to model dynamical systems which exhibit 
both discrete and continuous behaviors. The reachability analysis of hybrid sys- 
tems has been a challenging problem over the last few decades. The hard core 
of this problem lies in dealing with the continuous behavior of systems that are 
described by ordinary differential equations (ODEs). Although there are cur- 
rently several quite efficient and scalable approaches for reachability analysis 
of linear systems [8-10,14,16,19,20,26,34], nonlinear ODEs are much harder 
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to handle and the current approaches can be characterized into the following 
groups. 


Invariant Generation [18,21,22,27, 28,36,37,39]. An invariant J for a system S 
is a set such that any trajectory of S originating from J never escapes from I. 
Therefore, finding an invariant I such that the initial set /9 C J and the unsafe 
set U N I = @ indicates the safety of the system. In this way, there is no need to 
compute the flowpipe. The main problem with invariant generation is that it is 
hard to define a set of high quality constraints which can be solved efficiently. 


Abstraction and Hybridization [2,11,24,31,35]. The basic idea of the abstraction- 
based approach is first constructing a linear model which over-approximates the 
original nonlinear dynamics and then applying techniques for linear systems to 
the abstraction model. However, how to construct an abstraction with the fewest 
discrete states and sufficiently high accuracy is still a challenging issue. 


Satisfiability Modulo Theory (SMT) Over Reals [6,7,23]. This approach encodes 
the reachability problem for nonlinear systems as first-order logic formulas over 
the real numbers. These formulas can be solved using for example ó—complete 
decision procedures that overcome the theoretical limits in nonlinear theories 
over the reals, by choosing a desired precision 6. An SMT implementing such 
procedures can return either unsat if the reachability problem is unsatisfiable or 
ó-sat if the problem is satisfiable given the chosen precision. The 6-sat verdict 
does not guarantee that the dynamics of the system will reach a particular region. 
It may happens that by increasing the precision the problem would result unsat. 
In general the limit of this approach is that it does not provide as a result a 
complete and comprehensive description of the reachability set. 


Bounded Time Flowpipe Computation |1,3—5,25,32]. The common technique 
to compute a bounded flowpipe is based on interval method or Taylor model. 
Interval-based approach is quite efficient even for high dimensional systems [29], 
but it suffers the wrapping effect of intervals and can quickly accumulate over- 
approximation errors. In contrast, the Taylor-model-based approach is more pre- 
cise in that it uses a vector of polynomials plus a vector of small intervals to sym- 
bolically represent the flowpipe. However, for the purpose of safety verification 
or reachability analysis, the Taylor model has to be further over-approximated 
by intervals, which may bring back the wrapping effect. In particular, the wrap- 
ping effect can explode easily when the flowpipe segment over a time interval 
is stretched drastically due to a large difference in speed between individual 
trajectories. This case is demonstrated by the following example. 


Example 1 (Running example). Consider the 2D system [30] described by « = y 
and y = z?. Let the initial set Xo be a line segment x € [1.0,1.0] and y € 
[-1.05, —0.95], Fig. la shows the simulation result on three points in Xg over 
time interval [0, 6.6]. The reachable set at t = 6.6s is a smooth curve connecting 
the end points of the three trajectories. As can be seen, the trajectory originating 
from the top is left far behind the one originating from the bottom, which means 
that the tiny initial line segment is being stretched into a huge curve very quickly, 
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(a) (b) 


Fig. 1. (a) Simulation for Example 1 showing flowpipe segment being extremely 
stretched and deformed, (b) Interval over-approximation of the Taylor model com- 
puted by Flow* [3]. 


while the width of the flowpipe is actually converging to 0. As a result, the 
interval over-approximation of this huge curve can be extremely conservative 
even if its Taylor model representation is precise, and reducing the time step 
size is not helpful. To prove this point, we computed with Flow* [3] a Taylor 
model series for the time horizon of 6.6s which consists of 13200 Taylor models. 
Figure 1b shows the interval approximation of the Taylor model series, which 
apparently starts exploding. 


In this paper, we propose to use piecewise barrier tubes (PBTs) to over- 
approximate flowpipes of polynomial nonlinear systems, which can avoid the 
issue caused by the excessive stretching of a flowpipe segment. The idea of PBT 
is inspired from barrier certificate [22,33]. A barrier certificate B(x) is a real- 
valued function such that (1) B(x) > 0 for all æ in the initial set Xo; (2) 
B(x) < 0 for all æ in the unsafe set Xy; (3) no trajectory can escape from 
{x € R” | B(x) > 0} through the boundary (x € R” | B(z) = 0}. A sufficient 
condition for this constraint is that the Lie derivative of B(a) w.r.t the dynamics 
$t = f is positive all over the invariant region, i.e., Cg B(x) > 0, which means 
that all the trajectories must move in the increasing direction of the level sets 
of B(a). 

Barrier certificates can be used to verify safety properties without computing 
the flowpipe explicitly. The essential idea is to use the zero level set of B(x) as 
a barrier to separate the flowpipe from the unsafe set. Moreover, if the unsafe 
set is very close to the boundary of the flowpipe, the barrier has to fit the shape 
of the flowpipe to make sure that all components of the constraint are satisfied. 
However, the zero level set of a polynomial of fixed degree may not have the 
power to mimic the shape of the flowpipe, which means that there may exist no 
solution for the above constraints even if the system is safe. This problem might 
be addressed using piecewise barrier certificate, i.e., cutting the flowpipe into 
small pieces so that every piece is straight enough to have a barrier certificate 
of simple form. Unfortunately, this is infeasible because we know nothing about 
the flowpipe locally. Therefore, we have to find another way to proceed. 

Instead of computing a single barrier certificate, we propose to compute bar- 
rier tubes to piecewise over-approximate the flowpipe. Concretely, in the begin- 
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ning, we first construct a containing box, called enclosure, for the initial set 
using interval approach [29] and simulation, then, using linear programming, we 
compute a group of barrier functions which work together to form a tight tube 
(called barrier tube) around the flowpipe. Similarly, taking the intersection of 
the barrier tube and the boundary of the box as the new initial set, we repeat 
the previous operations to obtain successive barrier tubes step by step. The key 
point here is how to compute a group of tightly enclosing barriers around the 
flowpipe without a constraint on the unsafe set inside the box. Our basic idea 
is to construct a group of auxiliary state sets U around the flowpipe and then, 
for each U; € U, we compute a barrier certificate between U; and the flowpipe. 
If a barrier certificate is found, we expand U; towards the flowpipe iteratively 
until no more barrier certificate can be found; otherwise, we shrink U; away 
from the flowpipe until a barrier certificate is found. Since the auxiliary sets 
are distributed around the flowpipe, so is the barrier tube. The benefit of such 
piecewise barrier tubes is that they are time independent, and hence can avoid 
the issue of stretched flowpipe segments caused by speed differences between 
trajectories. Moreover, usually a small number of BTs can form a tight over- 
approximation of the flowpipe, which means that less computation is needed to 
decide the intersection of PBT and the unsafe set. 


The main contributions of this paper are as follows: 


1. We transform the constraint-solving problem for barrier certificates into a 
linear programming problem using Handelman representation [15]; 

2. We introduce PBT to over-approximate the flowpipe of nonlinear systems, 
thus dealing with flowpipes independent of time and hence avoiding the error 
explosion caused by stretched flowpipe segments; 

3. We implement a prototype in C++ to compute PTB automatically and we 
show the effectiveness of our approach by providing a comparison with the 
state-of-the-art tools for reachability analysis of polynomial nonlinear systems 
such as CORA [1] and Flow* [3]. 


The paper is organized as follows. Section 2 is devoted to the preliminaries. 
Section 3 shows how to compute barrier certificates using Handelman represen- 
tation, while in Sect. 4 we present a method to compute Piecewise Barrier Tubes. 
Section 5 provides our experimental results and we conclude in Sect. 6. 


2 Preliminaries 


In this section, we recall some concepts used throughout the paper. We first 
clarify some notation conventions. If not specified otherwise, we use boldface 
lower case letters to denote vectors, we use R for the real number field and 
IN for the set of natural numbers, and we consider multivariate polynomials in 
R[a], where the components of æ act as indeterminates. In addition, for all the 
polynomials B(u, x), we denote by u the vector composed of all the u; and 
denote by a the vector composed of all the remaining variables x; that occur in 
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the polynomial. We use R>o and Ryo to denote the domain of nonnegative real 
number and positive real number respectively. 

Let P C R” be a convex and compact polyhedron with non-empty interior, 
bounded by linear polynomials pi,::- ,Pm € R[x]. Without lose of generality, 
we may assume P = {æ € R” | p;(x) > 0,1 = 1,--- ,m}. 

Next, we present the notation of the Lie derivative, which is widely used in 
the discipline of differential geometry. Let f : R” — R” be a continuous vector 
field such that t; = f;(z) where t; is the time derivative of z;(t). 


Definition 1 (Lie derivative). For a given polynomial p € R|æ] over x = 
(z1,...,c4) and a continuous system « = f, where f = (fi,..., fn), the Lie 
derivative of p € R[x] along f of order k is defined as follows. 


def | P» k=0 
Lip = a 
d $e js Ed 


Essentially, the k-th order Lie derivative of p is the k-th derivative of p w.r.t. 
time, i.e., reflects the change of p over time. We write Lpp for £p. 

In this paper, we focus on semialgebraic nonlinear systems, which are defined 
as follows. 


Definition 2 (Semialgebraic system). A semialgebraic system is a triple 
M (x, f, Xo, I), where 

X C R” is the state space of the system M, 

f € R[z]" is locally Lipschitz continuous vector function, 

Xo C X is the initial set, which is semialgebraic [40], 

I is the invariant of the system. 


mw Mo nm 


The local Lipschitz continuity guarantees the existence and uniqueness of 
the differential equation $ = f locally. A trajectory of a semialgebraic system is 
defined as follows. 


Definition 3 (Trajectory). Given a semialgebraic system M, a trajectory 
originating from a point xo € Xo to time T > 0 is a continuous and differentiable 
function Q(zo,t) : [0, T) — R” such that (1) G(xo,0) = xo , and (2) Vr € [0, T): 
S|. = f(¢(a0,7)). T is assumed to be within the maximal interval of existence 
of the solution from xo. 


For ease of readability, we also use ¢(t) for ¢(ao,t). In addition, we use 
Flowg(X9) to denote the flowpipe of initial set Xo, i.e., 


Flowg(Xo) © (£(zo,t) | xo € Xo,t ERs, ¢ = f(O} (1) 


Definition 4 (Safety). Given an unsafe set Xy C X, a semialgebraic system 
M = (X, f, Xo, I) is said to be safe if no trajectory ¢(a0,t) of M satisfies that 
dr € Rso: z(7) € Xu, where zo € Xo. 
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3 Computing Barrier Certificates 


Given a semialgebraic system M, a barrier certificate is a real-valued function 
B(x) such that (1) B(z) > 0 for all x in the initial set; (2) B(a) < 0 for all æ in 
the unsafe set; (3) no trajectory can escape from the region of B(a) > 0. Then, 
the hyper-surface (xz € R” | B(x) = 0} forms a barrier separating the flowpipe 
from the unsafe set. To compute such a barrier certificate, the most common 
approach is template based constraint solving, i.e., firstly figure out a sufficient 
condition for the above condition and then, set up a template polynomial B(u, x) 
of fixed degree, and finally solve the constraint on u derived from the sufficient 
condition on B(u, x). There are a couple of sufficient conditions available for 
this purpose [13, 22,27]. In order to have an efficient constraint solving method, 
we adopt the following condition [33]. 


Theorem 1. Given a semialgebraic system M, let Xo and U be the initial set 
and the unsafe set respectively, the system is guaranteed to be safe if there exists 
a real-valued function B(a) such that 


Va € Xo: B(x) >0 (2) 
V2 EI: LB>0 (3) 
Va € Xy : B(x) «0 (4) 


In Theorem 1, the condition (3) means that all the trajectories of the system 
always point in the increasing direction of the level sets of B(x) in the region I. 
Therefore, no trajectory starting from the initial set would cross the zero level 
set. The benefit of this condition is that it can be solved more efficiently than 
other existing conditions [13,22] although it is relatively conservative. The most 
widely used approach is to transform the constraint-solving problem into a sum- 
of-squares (SOS) programming problem [33], which can be solved in polynomial 
time. However, a serious problem with SOS programming based approach is 
that automatic generation of polynomial templates is very hard to perform. We 
now show an example to demonstrate the reason. For simplicity, we assume that 
the initial set, the unsafe set and the invariant are defined by the polynomial 
inequalities Xo(z) > 0, Xy(z) > 0 and I(x) > 0 respectively, then the SOS 
relaxation of Theorem 1 is that the following polynomials are all SOS 


B(x) — ii(z)Xo(z) + & (5) 
Ly B— pio(x) I(x) + eo (6) 
— Bla) — us(x)Xu (x) + es (7) 


where ju;(a),7 = 1,--- ,3 are SOS polynomials as well and e; > 0,4 = 1,--- ,3. 
Suppose the degrees of Xo(z), I(x) and Xy(a) are all odd numbers. Then, the 
degree of the template for B(x) must be an odd number too. The reason is that, 
if deg(B) is an even number, in order for the first and third polynomials to be 
SOS polynomials, deg(B) must be greater than both deg(u3 Xy) and deg(u1 Xo), 
which are odd numbers. However, since the first and third condition contain B(x) 
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and — B(x) respectively, their leading monomials must have the opposite sign, 
which means that they cannot be SOS polynomial simultaneously. Moreover, the 
degrees of the templates for the auxiliary polynomials (x), ua (2) must also be 
chosen properly so that deg(u1 Xo) = deg(ua Xy) = deg( B), because only in this 
way the leading monomials (which has an odd degree) of (5) and (7) have the 
chance to be resolved so that the resultant polynomial can be a SOS. Similarly, 
in order to make the second polynomial a SOS as well, one has to choose an 
appropriate degree for u5(x) according to the degree of C;B and I(x). As a 
result, the tangled constraints on the relevant template polynomials reduce the 
power of SOS programming significantly. 

Due to the above reason, inspired by the work [38], we use Handelman repre- 
sentation to relax Theorem 1. We assume that the initial set Xo, the unsafe set 
Xy and the invariant J are all convex and compact polyhedra, i.e., Xp = {a € 
R” | n(z) 2 0, pa. (z) > 0}, 1 = {x € R” | a (m) 20, , qm, (2) > 0} 
and Xy = {a € R” | ri(z) > 0,--- ra, (x) > 0}, where pi(z), q;(z), rx(a) are 
linear polynomials. Then, we have the following theorem. 

Theorem 2. Given a semialgebraic system M, let Xo, Xy and I be defined as 


above, the system is guaranteed to be safe if there exists a real-valued polynomial 
function B(a) such that 


Bla) = M. Ape; +e (8) 
[e| x Mi 

LiB= M. App odn e (9) 
[8| M» 

—B(x) = 5 Ayri ee rma? +63 (10) 
IvIE Ms 


where Aq, Ap, M, € Rzo, e; € Rso and M; € N,i=1,--- ,3. 


Theorem 2 provides us with an alternative to SOS programming to find 
barrier certificate B(x) by transforming it into a linear programming problem. 
The basic idea is that we first set up a template B(u, x) of fixed degree as well as 
the appropriate M;,i = 1,--- ,3 that make the both sides of the three identities 
(8)-(10) have the same degree. Since (8)-(10) are identities, the coefficients of 
the corresponding monomials on both sides must be identical as well. Thus, 
we derive a system S of linear equations and inequalities over u, Aa, Ag, Ay. 
Now, finding a barrier certificate is just to find a feasible solution for S, which 
can be solved by linear programming. Compared to SOS programming based 
approach, this approach is more flexible in choosing the polynomial template as 
well as other parameters. We consider now a linear system to show how it works. 


Example 2. Given a 2D system defined by i = 2r 4-3y,j = —4z + 2y, let 
Xo = ((z,y) € R? | py = xz + 100 > 0,p9 = —90 — x > 0, p3 = y + 45 > 0,p4 = 
—40 — y > 0}, I = {(x,y) € R? | q = x + 110 > 0,q2 = —80 — x > 0,q3 = 
y + 45 > 0,q4 = —20 — y > 0} and Xy = ((z,y) € R? | rı = x +98 > 0, r2 = 
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Fig. 2. (a) Linear barrier certificate (straight red line) for Example 2. Rectangle in 
green: initial set, rectangle in red: unsafe set. (b) PBT for the running Example 5, 
consisting of 45 BTs. (c) Enclosure (before bloating) for flowpipe of Example 3 (green 
shadow region). (d) Enclosure (after bloating) for flowpipe of Example 3. (Color figure 
online) 


—90—z > 0,r3 = y+24 > 0,r4 = —20—y > 0). Assume B(u, x) = uy +ugr+uszy, 
M; = e; = 1 fori =1,--- ,3, then we obtain the following polynomial identities 
according to Theorem 2 


4 
ui + uar +usy— Y upi — € =0 
i=l 


4 
u»(2a t 3y) t ua( Am t 2y) 5 Aajdj — €9 S 0 
j=l 


4 
(u1 + uox + uay) So Ask —e4 0 
k=1 


where Àj; > 0 fori=1,--- ,3,7=1,--- ,4. By collecting the coefficients of x, y 
in the above polynomials, we obtain a system S of linear polynomial equations 
and inequalities over u;, Ajk. By solving S using linear programming, we obtain 
a feasible solution and Fig.2a shows the computed linear barrier certificate. 
Note that, for the aforementioned reason, it is impossible to find a linear barrier 
certificate using SOS programming for this example. 


4 Piecewise Barrier Tubes 


In this section, we introduce how to construct PBTs for nonlinear polynomial 
systems. The basic idea of constructing PBT is that, for each segment of the 
flowpipe, an enclosure box is first constructed and then, a BT is constructed to 
form a tighter over-approximation for the flowpipe segment inside the box. 


4.1 Constructing an Enclosure Box 


Given an initial set, the first task is to construct an enclosure box for the initial 
set and the following segment of the flowpipe. As pointed out in Sect. 1, one 
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principle to construct an enclosure box is to simplify the shape of the flowpipe 
segment, or in other words, to approximately bound the twisting of trajectories 
by some @ in the box, where the twisting of a trajectory is defined as follows. 


Definition 5 (Twisting of a trajectory). Let M be a continuous system and 
C(t) be a trajectory of M. Then, ¢(t) is said to have a twisting of 0 on the 
time interval I = [T1, To], written as £r(C), if it satisfies that €r(¢) = 0, where 


def g (ti), C(t 
£r(C) = sups, taer arccos ( verear. 


The basic idea to construct an enclosure box is depicted in Algorithm 1. 


Algorithm 1. Algorithm to construct an enclosure box 
input : M: dynamics of the system; n: dimension of system; Xo: initial set 
01: twisting of simulation; d: maximum distance of simulation; 
output: E: an enclosure box containing Xo; P: plane where flowpipe exits ; 
G: range of intersection of Flow (Xo) with plane P by simulation 


1 sample a set So of points from Xo; 

2 select a point £o € So; 

3 find a time step size AT by (0, d)-bounded simulation for £o; 

4 AT — AT; 

5 while AT >« do 

6 [f eund, E] — find an enclosure box by interval arithmetic using AT; 
7 if found then 

8 do a simulation for all x; € So, select the plane P which intersects with 

the most of simulations; generate G; 

9 bloat E s.t Flows(Xo) gets out of E only through the facet in P; 
10 break; 
11 else 
12 | AT — 1/2* AT; 


Remark 1. In Algorithm 1, we use interval arithmetic [29] and simulation to 
construct an enclosure box E for a given initial set and its following flowpipe 
segment. Meanwhile, we obtain a coarse range of the intersection of the flowpipe 
and the boundary of the enclosure, which helps to accelerate the construction of 
barrier tube. To be simple, the enclosure is constructed in a way such that the 
flowpipe gets out of the box through a single facet. Given an initial set Xo, we 
first sample a set Sp of points from X5 for simulation. Then, we select a point 
To from So and do (0, d)-simulation on a to obtain a time step AT. A (0,d)- 
simulation is a simulation that stops either when the twisting of the simulation 
reaches 0 or when the distance between zo and the end point reaches d. On the 
one hand, by using a small 0, we aim to achieve a straight flowpipe segment. 
On the other hand, by specifying a maximal distance d, we make sure that the 
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simulation can stop for a long and straight flowpipe. At each iteration of the while 
loop in line 5, we first try to construct an enclosure box by interval arithmetic 
over AT. If such an enclosure box is created, we then perform a simulation (see 
line 8) for all the points in So to find out the plane P of facet which intersects 
with the most of the simulations. The idea behind line 9 is that in order to better 
over-approximate the intersection of the flowpipe with the boundary of the box 
using intervals, we push the other planes outwards to make P the only plane 
where the flowpipe get out of the box. Certainly, simply by simulation we cannot 
guarantee that the flowpipe does not intersect the other facets. Therefore, we 
have the following theorem for the decision. 


Theorem 3. Given a semialgebraic system M and an initial set Xo, a bor E 
is an enclosure of Xo and F; is a facet of E. Then, (Flow s(Xo)N E) n F; =0 
if there exists a barrier certificate B;i(x) for Xo and F; inside E. 


Remark 2. According to the definition of barrier certificate, the proof of The- 
orem 3 is straightforward, which is ignored here. Therefore, to make sure that 
the flowpipe does not intersect the facet F;, we only need to find a barrier cer- 
tificate, which can be done using the approach presented in Sect. 3. Moreover, if 
no barrier certificate can be found, we further bloat the facet. Next, we still use 
the running Example 1 to demonstrate the process of constructing an enclosure. 


Example 3 (running example). Consider the system in Example 1 and the initial 
set x = 1.0, —1.05 € y < —0.95, let the bounding twisting of simulation be 0 = 
7/18, then the time step size we computed for interval evaluation is AT = 0.2947. 
The corresponding enclosure computed by interval arithmetic is shown in Fig. 2c. 
Furthermore, by simulation, we know that the flowpipe can reach both left facet 
and top facet. Therefore, we have two options to bloat the facet: bloat the left 
facet to make the flowpipe intersects the top facet only or bloat the top facet 
to make the flowpipe intersects left facet only. In this example, we choose the 
latter option and the bloated enclosure is shown in Fig. 2d. In this way, we can 
over-approximate the intersection of the flowpipe and the facet by intervals if we 
can obtain its boundary on every side. This can be achieved by finding barrier 
tube. 


4.2 Compute a Barrier Tube Inside a Box 


An important fact about the flowpipe of continuous system is that it tends to 
be straight if it is short enough, given that the initial set is straight as well 
(otherwise, we can split it). Suppose there is a small box E around a straight 
flowpipe, it will be easy to compute a barrier certificate for a given initial set 
and unsafe set inside E. A barrier tube for the flowpipe in E is a group of barrier 
certificates which form a tube around a flowpipe inside E. Formally, 


Definition 6 (Barrier Tube). Given a semialgebraic system M, a bor E and 
an initial set Xo C E, a barrier tube is a set of real-valued functions BT = 
{Bi(a),i = 1,--- , m) such that for all By(a) € BT: (1) Va € Xo : Bi(z) > 0 
and, (2) Yx € E : Cg Bi » 0. 
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According to Definition 6, a barrier tube BT is defined by a set of real-valued 
functions and every function inequality B;(æ) > 0 is an invariant of M in E and 
so do their conjunction. The property of a barrier tube BT is formally described 
in the following theorem. 


Theorem 4. Given a semialgebraic system M, a box E and an initial set X9 C 
E, let BT = {B;(a) : à 2 1,--- , m] be a barrier tube of M and Q = {x € R” | 
A^ Bi(z) > 0, B; € BT), then Flowg(Xo) h E € Qn E. 


Remark 3. 'Theorem 4 states that an arbitrary barrier tube is able to form an 
over-approximation for the reach pipe in the box E. Compared to a single barrier 
certificate, multiple barrier certificates could over-approximate the flowpipe more 
precisely. However, since there is no constraint on unsafe sets in Definition 6, 
a barrier tube satisfying the definition could be very conservative. In order to 
obtain an accurate approximation for the flowpipe, we choose to create additional 
auxiliary constraints. 


Auxiliary Unsafe Set (AUS). To obtain an accurate barrier tube, there are 
two main questions to be answered: (1) How many barrier certificates are needed? 
and (2) How do we control their positions to make the tube well-shaped to better 
over-approximate the flowpipe? The answer for the first question is quite simple: 
the more, the better. T'his will be explained later on. For the second question, 
the answer is to construct a group of properly distributed auxiliary state sets 
(AUSs). Each set of the AUSs is used as an unsafe set U; for the system and 
then we compute a barrier certificate B; for U; according to Theorem 2. Since 
the zero level set of B; serves as a barrier between the flowpipe and U;, the 
space where a barrier could appear is fully determined by the position of Uj. 
Roughly speaking, when U; is far away from the flowpipe, the space for a barrier 
to exist is wide as well. Correspondingly, the barrier certificate found would 
usually locate far away from the flowpipe as well. Certainly, as U; gets closer to 
the flowpipe, the space for barrier certificates also contracts towards the flowpipe 
accordingly. Therefore, by expanding U; towards the flowpipe, we can get more 
precise over-approximations for the flowpipe. 


Why Multiple AUS? Although the accuracy of the barrier certificate over- 
approximation can be improved by expanding the AUS towards the flowpipe, 
the capability of a single barrier certificate is very limited because it can erect a 
barrier which only matches a single profile of the flow pipe. However, if we have 
a set U of AUSs which are distributed evenly around the flowpipe and there is a 
barrier certificate B; for each U; € U, these barrier certificates would be able to 
over-approximate the flowpipe from a number of profiles. Therefore, increasing 
the number of AUSs can increase the quality of the over-approximation as well. 
Furthermore, if all these auxiliary sets are connected, all the barriers would form 
a tube surrounding the flowpipe. Therefore, if we can create a series of boxes 
piecewise covering the flowpipe and then construct a barrier tube for every piece 
of the flowpipe, we obtain an over-approximation for the flowpipe by PBT. 
Based on the above idea, we provide Algorithm 2 to compute barrier tube. 
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Algorithm 2. Algorithm to compute barrier tube 
input : M: dynamics of the system; Xo: Initial set; 
E: interval enclosure of initial set; 
G: interval approx. of (0E N Flow s(Xo)) by simulation; 
P: plane where flowpipe exits from box; 
D: candidate degree list for template polynomial; 
€: difference in size between AUS (auxiliary unsafe set) 
output: BT: barrier tube; XQ: interval over-approximation of (BT N E) 


1 foreach Gi;: an facet of G do 

2 found — false ; 

3 foreach d c D do 

4 AUS «— CreateAUS(G, P, Gij); 

5 while true do 

6 [found, Bij] —— ComputeBarrierCert(Xo,E, AUS, d) ; 
7 if found then AUS’ — Expand (AUS); 

8 else AUS’ —— Contract (AUS) ; 

9 if Diff(AUS', AUS) < e then break; 

10 else AUS’ — AUS; 


11 if found then BT —— Push(BT, B;;); break; 
12 | else return FAIL; 


13 return SUCCEED; 


Remark 4. In Algorithm 2, for an n-dimensional flowpipe segment, we aim to 
build à barrier tube composed of 2(n — 1) barrier certificates, which means we 
need to construct 2(n — 1) AUSs. According to Algorithm 1, we know that the 
plane P is the only exit of the flowpipe from the enclosure E and G is roughly 
the region where they intersect. Let FC be the facet of E that contains G, then 
for every facet FS of FC, we can take an (n — 1)-dimensional rectangle between 
FS and Gij as an AUS, where Gj; is the facet of G adjacent to FË. Therefore, 
enumerating all the facets of G in line 1 would produce 2(n — 1) positions for 
AUS. The loop in line 3 is attempting to find a polynomial barrier certificate 
of different degrees in D. In the while loop 5, we iteratively compute the best 
barrier certificate by adjusting the width of AUS through binary search until 
the difference in width between two successive AUSs is less than the specified 
threshold e. 


Example 4 (Running example). Consider the initial set and the enclosure com- 
puted in Example 3, we use Algorithm 2 to compute a barrier tube. The ini- 
tial set is Xo = [1.0,1.0] x [-1.05, —0.95] and the enclosure of X5 is E = 
(0.84, 1.01] x [-1.1,—0.75], G = [0.84,0.84] x [-0.91, —0.80], the plane P is 
x = 0.84, D = {2} and e = 0.001. The barrier tube consists of two barrier 
certificates. As shown in Fig. 3, each of the barrier certificates is derived from 
an AUS (red line segment) which is located respectively on the bottom-left and 
top-left boundary of E. 
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Fig. 3. Computing process of BT for Example 4. Blue line segment: initial set, red line 
segment: AUS. Figure a-l show how intermediate barrier certificates changed with the 
width of the AUSs and Fig. 1 shows the final BT (shadow region in green). (Color figure 
online) 


4.3 Compute Piecewise Barrier Tube 


During the computation of a barrier tube by Algorithm 2, we create a series 
of AUSs around the flowpipe, which build up a rectangular enclosure for the 
intersection of the flowpipe and the facet of the enclosure box. As a result, such 
a rectangular enclosure can be taken as an initial set for the following flowpipe 
segment and then Algorithm 2 can be applied repeatedly to compute a PBT. 
The basic procedure to compute PBT is presented in Algorithm 3. 


Remark 5. In Algorithm 3, initially a box that contains the initial set Xo is 
constructed using Algorithm 1. The loop in line 2 consists of 3 major parts: (1) 
In lines 3-6, a barrier tube BT is firstly computed using Algorithm 2. The while 
loop keeps shrinking the box until a barrier tube is found; (2) In line 8, the initial 
set Xo is updated for the next box; (3) In line 9, a new box is constructed to 
contain Xg and the process is repeated. 


Example 5 (Running example). Let us consider again the running example. We 
set the length of PBT to 45 and the PBT we obtained is shown in Fig. 2b. 
Compared to the interval over-approximation of the Taylor model obtained using 
Flow*, the computed PBT consists of a significantly reduced number of segments 
and is more precise for the absence of stretching. 


Safety Verification Based on PBT. The idea of safety verification based on 
PBT is straightforward. Given an unsafe set Xy, for each intermediate initial set 
Xo and the corresponding enclosure box E, we first check whether Xy N E = @. 
If not empty, we would further find a barrier certificate between Xy and the 
flowpipe of X, inside E. If empty or barrier found, we continue to compute 
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Algorithm 3. Algorithm to compute PBT 
input : M: dynamics of the system; Xo: Initial set; 
N: length of piecewise barrier tube 
output: PBT: piecewise barrier tube 


1 E — construct an initial box containing Xo; 
2 fori 1 to N do 

3 [Found, BT] — findBarrierTube (E,X;); 
4 while not Found do 
5 E — Shrink (E) ; 
6 [Found, BT] — findBarrierTube (E,X,); 

7 if Found then 

8 Xo — OverApprox(BT N Facet(E)) ; 

9 E — construct the next box containing Xo; 


'Table 1. Model definitions 


Model Dynamics Initial set Xo | Time horizon (TH) 
Controller 2D |i = zy +y? +2 | a € [29.9,30.1] | 0.0125 
y—2a?-42x—3y|y 38, —36] 
Vander Pol |z-y x € [1,1.5] 6.74 
Oscillator yo=y—x—-2’y |y € [2.0, 2.45] 
Lotka-Volterra| $ = z(1.5— y) |x € [4.5,5.2] | 3.2 
y= —y(3— x) |y € [1.8,2.2] 
$-—10(y-z) |æ € [1.79,1.81] | 0.51 
Controller 3D |ý = 2° y € [1.0, 1.1] 
ż = xy —2.667z |y € [0.5,0.6] 


longer PBT. The refinement of PBT computation can be achieved by using 
smaller E and higher d for template polynomial. 


5 Implementation and Experiments 


We have implemented the proposed approach as a C++ prototype called Piece- 
wise Barrier Tube Solver (PBT'S), choosing Gurobi [12] as our internal linear 
programming solver. We have also performed some experiments on a benchmark 
of four nonlinear polynomial dynamical systems (described in Table 1) to com- 
pare the efficiency and the effectiveness of our approach w.r.t. other tools. Our 
experiments were performed on a desktop computer with a 3.6 GHz Intel Core 
17-7700 8 Core CPU and 32 GB memory. The results are presented in Table 2. 


Remark 6. There are a number of outstanding tools for flowpipe computation 
[1,3-5]. Since our approach is to perform flowpipe computation for polynomial 


Reachable Set Over-Approximation for Nonlinear Systems 463 


Table 2. Tool Comparison on Nonlinear Systems. #var: number of variables; T: com- 
puting time; NFS: number of flowpipe segments; DEG: candidate degrees for tem- 
plate polynomial (only for PBTS); TH: time horizon for flowpipe (only for Flow* and 
CORA). FAIL: failed to terminate under 30 min. 


PBTS Flow* CORA 
Model #var |T NFS | DEG | TH T NFS|T NFS 
Controller 2D |2 5.62) 46 |2 0.0125 | 22.17 | 6250 | FAIL |- 
Van der Pol 2 13.38 |110 |2,3 16.74 15.28 | 337 | 212.51 12523 
Lotka-Volterra | 2 6.65| 30 |3,4 |3.2 10.59 | 3200| 35.84) 2903 
Controller 3D |3 83.65| 15 |4 0.51 11.61 | 5100| 65.18, 6767 


nonlinear systems, we pick two of the most relevant state-of-the-art tools for 
comparison: CORA [1] and Flow* [3]. Note that a big difference between our 
approach and the other two approaches is that PBTS is time-independent, which 
means that we cannot compare PBTS with CORA or Flow* over the exactly 
same time horizon. To be fair enough, for Flow* and CORA, we have used 
the same time horizon for the flowpipe computation, while we have computed 
a slightly longer flowpipe using PBTS. To guide the reader, we have also used 
different plotting colors to visualize the difference between the flowpipes obtained 
from the three different tools. 


Evaluation. As pointed out in Sect. 1, a common problem with the bounded- 
time integration based approaches is that the flowpipe segment of a dynamics sys- 
tem can be extremely stretched with time so that the interval over-approximation 
of the flowpipe segment is very conservative and usually the solver has to stop 
prematurely due to the error explosion. This fact can be found easily from the 
figures Fig. 4, 5, 6 and 7. In particular, for Controller 2D, Flow* can give quite 
nice result in the beginning but started producing an exploding flowpipe very 
quickly (Note that Flow* offers options to produce better plotting which how- 
ever is expensive and was not used for safety verification. CORA even failed to 
give a result after over 30 min of running). This phenomenon reappeared with 
both Flow* and CORA for Controller 3D. Notice that most of the time horizons 
used in the experiment are basically the time limits that Flow* and CORA can 
reach, i.e., a slightly larger value for the time horizon would cause the solvers to 
fail. In comparison, our tool has no such problem and can survive a much longer 
flowpipe before exploding or even without exploding as shown in Fig. 4a. 
Another important factor of the approaches is the efficiency. As is shown in 
Table 2, our approach is more efficient on the first three examples but slower on 
the last example than the other two tools. The reason for this phenomenon is 
that the degree d of the template polynomial used in the last example is higher 
than the others and increasing d led to an increase in the number of decision 
variables in the linear constraint. T'his suggests that using smaller d on shorter 
flowpipe segment would be better. In addition, we can also see in Table2 that 
the number of the flowpipe segments produced by PBTS is much fewer than that 
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(b) Flow* 


Fig. 4. Flowpipe for Controller 2D. 


(b) CORA (c) Flow* 


Fig. 5. Flowpipe for Van der Pol Oscillator. 
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Fig. 6. Flowpipe for Lotka-Volterra. 
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Fig. 7. Flowpipe (projection) for Controller 3D. 


produced by Flow* and CORA. In this respect, PBTS would be more efficient 
on safety verification. 


6 Conclusion 


We have presented PBTS, a novel approach to over-approximate flowpipes of 
nonlinear systems with polynomial dynamics. The benefit of using BTs is that 
they are time-independent and hence cannot be stretched or deformed by time. 
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Moreover, this approach only results in a small number of BTs which are suf- 
ficient to form a tight over-approximation for the flowpipe, hence the safety 
verification with PBT can be very efficient. 
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Abstract. Reachability analysis is difficult for hybrid automata with 
affine differential equations, because the reach set needs to be approxi- 
mated. Promising abstraction techniques usually employ interval meth- 
ods or template polyhedra. Interval methods account for dense time and 
guarantee soundness, and there are interval-based tools that overapprox- 
imate affine flowpipes. But interval methods impose bounded and rigid 
shapes, which make refinement expensive and fixpoint detection difficult. 
Template polyhedra, on the other hand, can be adapted flexibly and can 
be unbounded, but sound template refinement for unbounded reacha- 
bility analysis has been implemented only for systems with piecewise 
constant dynamics. We capitalize on the advantages of both techniques, 
combining interval arithmetic and template polyhedra, using the former 
to abstract time and the latter to abstract space. During a CEGAR 
loop, whenever a spurious error trajectory is found, we compute addi- 
tional space constraints and split time intervals, and use these space-time 
interpolants to eliminate the counterexample. Space-time interpolation 
offers a lazy, flexible framework for increasing precision while guarantee- 
ing soundness, both for error avoidance and fixpoint detection. To the 
best of out knowledge, this is the first abstraction refinement scheme for 
the reachability analysis over unbounded and dense time of affine hybrid 
systems, which is both sound and automatic. We demonstrate the effec- 
tiveness of our algorithm with several benchmark examples, which cannot 
be handled by other tools. 


1 Introduction 


Formal verification techniques can be used to either provide rigorous guarantees 
about the behaviors of a critical system, or detect instances of violating behavior 
if such behaviors are possible. Formal verification has become widely used in the 
design of software and digital hardware, but has yet to show a similar success for 
physical and cyber-physical systems. One of the reasons for this is a scarcity of 
suitable algorithmic verification tools, such as model checkers, which are formally 
sound, precise, and scale reasonably well. In this paper, we propose a novel 
verification algorithm that meets these criteria for systems with piecewise affine 
dynamics. The performance of the approach is illustrated experimentally on a 
number of benchmarks. Since systems with affine dynamics have been studied 
before, we first describe why the available methods and tools do not handle this 
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class of systems sufficiently well, and then describe our approach and its core 
contributions. 


Previous Approaches. The algorithmic verification of systems with continuous 
or discrete-continuous (hybrid) dynamics is a hard problem both in theory 
and practice. For piecewise constant dynamics (PCD), the continuous succes- 
sor states (a.k.a. flow pipe) can be computed exactly, and the complexity is 
exponential in the number of variables [17,19]. While in principle, any dynam- 
ics can be approximated arbitrarily well by PCD systems using an approach 
called hybridization [20], this requires partitioning of the state space, which 
often leads to prohibitive computational costs. For piecewise affine dynamics 
(PWA), one-step successors can be computed approximately using complex set 
representations. However, all published approaches suffer either from a possi- 
bly exponential increase in the complexity of the set representation, or from a 
possibly exponential increase in the approximation error as the considered time 
interval increases; this will be argued in detail in Sect. 4. 

In addition to these theoretical obstacles, we note the following practical 
obstacles for the available tools and their performance in experiments. The only 
available model checkers that are (i) sound (i.e., they compute provable dense- 
time overapproximations), (ii) unbounded (i.e., they overapproximate the flow- 
pipe for an infinite time horizon), and (iii) arbitrarily precise (i.e., they support 
precision refinement) are, with one exception, limited to PCD systems, namely, 
HyTech [18], PHAVer [13], and Lyse [7]. The tool Ariadne [6] can deal with affine 
dynamics and is sound, unbounded, and precise. However, Ariadne discretizes 
the reachable state space with a rectangular grid. This invariably leads to an 
exponential complexity in terms of the number of variables. Other tools that are 
applicable to PWA systems do not meet our criteria in that they are either not 
formally sound (e.g., CORA [2], SpaceEx [15]), not arbitrarily precise because 
of templates or particular data structures (e.g., SpaceEx, Flow* [8], CORA), 
or limited to bounded model checking (e.g., dReach [24], Flow*). All the above 
tools exhibit fatal limitations in scalability or precision on standard PWA bench- 
marks; they typically work only on well-chosen examples. Note that while these 
tools do not meet the criteria we advance in this paper, they of course have 
strengths in other areas handling nonlinear and nondeterministic dynamics. 


Our Approach. We view iterative abstraction refinement as critical for sound- 
ness and precision management, and fixpoint detection as critical for eval- 
uating unbounded properties. We implement, for the first time, a CEGAR 
(counterexample-guided abstraction refinement) scheme in combination with a 
fixpoint detection criterion for PWA systems. Our abstraction refinement scheme 
manages complexity and precision trade-offs in a flexible way by decoupling time 
from space: the dense timeline is partitioned into a sequence of intervals that 
are refined individually and lazily, by splitting intervals, to achieve the necessary 
precision and detect fixpoints; state sets are overapproximated using template 
polyhedra that are also refined individually and lazily, by adding normal direc- 
tions to templates; and both refinement processes are interleaved for optimal 
results, while maintaining soundness with each step. A similar approach was 
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recently proposed for the limited class of PCA systems [7]; this paper can be 
seen as an extension of the approach to the class of piecewise affine dynamics. 

With each iteration of the CEGAR loop, a spurious counterexample is 
removed by computing a proof of infeasibility in terms of a sequence of linear 
constraints in space and interval constraints in time, which we call a sequence 
of space-time interpolants. We use linear programming to construct a suitable 
sequence of space-time intervals and check for fixpoints. If a fixpoint check fails, 
we increase the time horizon by adding new intervals. The separation of time 
from space gives us the flexibility to explore different refinement strategies. Fine- 
tuning the iteration of space refinement (adding template directions), time refine- 
ment (splitting intervals), and fixpoint checking (adding intervals), we find that 
it is generally best to prefer fewer time intervals over fewer space constraints. 
Based on performance evaluation, we even expand individual intervals time when 
this is possible without sacrificing the necessary precision for removing a coun- 
terexample. 


2 Motivating Example 


The ordinary differential equation over the variables x and y 


i = 0.lx — y + 1.8 (1) 
y = x + 0.ly — 2.2 


moves counterclockwise around the point (2,2) in an outward spiral. We center 
a box B (of side 0.92) on the same point and place a diagonal segment S close to 
the bottom right corner of B, without touching it (between (2, 1) and (3.5, 2); see 
Fig. 1). Then, we consider the problem of proving that every trajectory starting 
from any point in S never hits B. This is a time-unbounded reachability problem 
for a hybrid automaton with piecewise affine dynamics and two control modes. 
The first mode has the dynamics above (Eq. 1) and S as initial region. It has a 
transition to a second mode, which in its turn has B as invariant. The second 
mode is a bad mode, which all trajectories indeed avoid. 

We tackle the reachability problem by abstraction refinement. In particular, 
we aim at automatically constructing an enclosure for the flowpipe—i.e., for the 
set of trajectories from S—which (i) avoids the bad state B and (ii) covers the 
continuous timeline up to infinity. Figure 1 shows three abstractions that result 
from different strategies for refining an initial space partition (i.e., template) and 
time partition (i.e., sequence of time intervals). All three refinement schemes 
start by enclosing S with an initial template polyhedron P, and then transform- 
ing P into a sequence of abstract flowpipe sections intflow4(P), one for each 
interval [t, t| of an initial partitioning of the unbounded timeline. The computa- 
tion of new flowpipe sections stops when a fixpoint is reached,—i.e., we reach a 
time threshold t* whose flowpipe section closes a cycle with intflow" (P) c P. 
sufficient condition for any further flowpipe section to be contained within the 
union of previously computed sections. 
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Fig. 1. Comparison of abstraction refinement methods for the ODE in Eq. 1, the seg- 
ment S as initial region, and the box B as bad region. The polyhedron P is the template 
polyhedron of S, and the gray polyhedra are the flowpipe sections intflow ^t (P). 


Refinement scheme (a) sticks to a fixed octagonal template P—i.e., to the 
normals of a regular octagon—and iteratively halves all time intervals until every 
flowpipe section avoids the bad set B. This is achieved at interval width 1/64, but 
the computation does not terminate because no fixpoint is reached. Refinement 
scheme (b) splits time similarly but also computes a different, more accurate 
template for every iteration: first, an interval ft, t] is halved until it admits a 


halfspace interpolant —i.e., a halfspace H that S C H and intflow^9 (H) n B = 
Ø; then, a maximal set of linearly independent directions is chosen as template 
from the normals of the obtained halfspaces. Refinement scheme (b) succeeds 
at interval width 1/16 to avoid B and reach a fixpoint; the latter at time 6.25, 
with intflow??^(P) C P. Refinement scheme (c) modifies (b) by optimizing the 
refinement of the time partition: instead of halving time intervals, the maximal 
intervals which admit halfspace interpolants are chosen. This scheme produces 
a nonuniform time partitioning with an average interval width of about 1/8, 
discovers five template directions, and finds a fixpoint in fewer steps. 

Each iteration of the abstraction refinement loop consists of first abstracting 
the initial region into a template polyhedron, second solving the differential equa- 
tion into a sequence of interval matrices, and finally transforming the template 
polyhedron using each of the interval matrices. We represent each transformation 
symbolically, by means of its support function. Then, we verify (i) the separation 
between every support function and the bad region, and (ii) the containment of 
any support function in the initial template polyhedron. The separation prob- 
lem amounts to solving one LP, and the inclusion problem amounts to solving 
an LP in each template direction. If the separation fails, then we independently 
bisect each time that does not admit halfspace interpolants and expand each 
that does, until all are proven separated. Together, these halfspace interpolants 
form an infeasibility proof for the counterexample: a space-time interpolant. 
We forward the resulting new time intervals and halfspaces to the abstraction 
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generator, and repeat, using the refined partitioning and the augmented tem- 
plate. If the inclusion fails, then we increase the time horizon by some amount 
A, and repeat. Once we succeed with both separation and inclusion, the system 
is proved safe. 

This example shows the advantage of lazily refining both the space parti- 
tioning (i.e., the template) by adding directions, and the time partitioning, by 
splitting intervals. 


3 Hybrid Automata with Piecewise Affine Dynamics 


A hybrid automaton with piecewise affine dynamics consists of an n-dimensional 
vector x of real-valued variables and a finite directed multigraph (V, E), the 
control graph. We call it the control graph, the vertices v € V the control 
modes, and the edges e € E the control switches. We decorate each mode v € V 
with an initial condition Z, C IR”, a nonnegative invariant condition I, C Rp, 
and a flow condition given by the system of ordinary differential equations 


t = Az + by. (2) 


We decorate each switch e € E with a guard condition Ge C IR" and an update 
condition given the difference equations x :— Re£ + se . All constraints J, G, and 
Z are conjuctions of rational linear inequalities, A and R are constant matrices, 
and b and s constant vectors of rational coefficients. In this paper, whenever an 
indexing of modes and switches is clear from the context, we index the respective 
constraints and transformations similarly, e.g., we abbreviate A,, with Aj. 

A trajectory is a possibly infinite sequence of states (v, x) € V x IR" repeat- 
edly interleaved first by a switching time t € R>o and then by a switch e € E 


(vo, xo)to(vo, yo)eo(v1, x1)ti (v1, yi jer eras (3) 


for which there exists a sequence of solutions po, Y1,...: IR — IR, such that 
V;(0) = zi, v;(t;) = y; and they satisfy (i) the invariant conditions v;(t) € I; 
and (ii) the flow conditions 4);(t) = A;wvi(t) + bi, for all t € [0,t;]. Moreover, 
£o € Zo, every switch e; has source v;, destination v;,1, and the respective states 
satisfy (i) the guard condition y; € G; and (ii) the update z;,1 = R;yi + si. The 
maximal set of its trajectories is the semantics of the hybrid automaton, which 
is safe if none of them contains a special bad mode. 

Every hybrid automaton with affine dynamics can be transformed into an 
equivalent hybrid automaton with linear dynamics, i.e., the special case where 
b = 0 on every mode. We obtain such transformation by adding one extra variable 
y, rewriting the flow of every mode into « = Az + by, and forcing y to be always 
equal to 1, i.e., invariant y = 1 and flow y = 0 on every mode and update y’ = y 
on every switch. For this reason, in the following sections we discuss w.l.o.g. the 
reachability analysis of hybrid automata with linear dynamics. 
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4 Time Abstraction Using Interval Arithmetic 


We abstract the reach set of the hybrid automaton with a union of convex 
polyhedra. In particular, we abstract the states that are reachable in a mode 
using a finite sequence of images of the initial region over a time partitioning, 
until a completeness threshold is reached. Thereafter, we compute the template 
polyhedron of each of the images that can take a switch. Then, we repeat in the 
destination mode and we continue until a fixpoint is found. 

Precisely, a time partitioning T is a (possibly infinite) set of disjoint closed 
time intervals whose union is a single (possibly open) interval. For a finite set of 
directions D C IR”, the D-polyhedron of a closed convex set X is the tightest 
polyhedral enclosure whose facets normals are in D. In the following, we associate 
every mode v to a template D, and a time partitioning T, of the time axis I>, 
we employ interval arithmetic for abstracting the continuous dynamics (Sect. 
4.1), and on top of it we develop a procedure for hybrid dynamics (Sect. 4.2). 


4.1 Continuous Dynamics 


We consider w.lo.g. a mode with ODE reduced to the linear form t = A,z, 
invariant L,, and a given time interval [t,t]. Every linear ODE z = Az has the 
unique solution 


p(t) = exp( At) (0). (4) 


It follows (see also [16]) that the set of states reachable in v after exactly t time 
units from an initial region X is 


flow, (X) E exp(A,t)X n. [| exp(Ar(t—7))Lv, (5) 


O<T<t 
Then, the flowpipe section over the time interval [t,t] is 
flow (x) f U(fow (X) | t € [6])- (6) 


We note three straightforward but consequential properties of the reach set: 
(i) The accuracy of any convex abstraction depends on the size of the time 
interval: While flow‘ (X) is convex for convex X, this is generally not the case 
for flowt! (X). (ii) We can prune the time interval whenever we detect that the 
reach set no longer overlaps with the invariant: If for any t* 0, flowt" (X) — 0, 
then for all £ > t*, flowt (X) = ( and flowl^" (X) = flow" (X). (iii) We can 
prune the time interval whenever we detect containment in the initial states: If 
flowt (X) C X, then flowl**( X) = flow" (X). 

For given A and t, the matrix exp( At) can be computed with arbitrary, but 
only finite, accuracy. We resolve this problem by computing a rational interval 
matrix |M, M], which we denote intexp(A, t, t), such that for all t € ft, t] we have 
element-wise that 


exp( At) € intexp(A, t, t). (7) 
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This interval matrix can be derived efficiently with a variety of methods [25], 
e.g., using a guaranteed ODE solver or using interval arithmetic. The width 
of the interval matrix can be made arbitrarily small at the price of increasing 
the number of computations and the size of the representation of the rational 
numbers. In our approach, we do not rely in a fixed accuracy of the interval 
matrix. Instead, we require that the accuracy increases as the width of the time 
interval goes to zero. That way, we don’t need to introduce an extra parameter. 
To ensure progress in our refinement loop, we require that the interval matrix 
decreases monotonically when we split the time interval. Formally, if [t,t] C [u, wu] 
we require the element-wise inclusion intexp(A, t,t) C intexp(A, u, u). This can 
be ensured by intersecting the interval matrices with the original interval matrix 
after time splitting. 

While the mapping with interval matrices is in general not convex [29], we can 
simplify the problem by assuming that all points of X are in the positive orthant. 
As long as X is bounded from below, this condition can be satisfied by inducing 
an appropriate coordinate change. Under the assumption that X C IR&o, 


[M, M((X) = (y e R” | Mz € y € Mz and z € X}. (8) 


Combining the above results, we obtain a convex abstraction of the flowpipe 
over a time interval as 

intflowl^ ( X) © intexp(A, t, f) X n I. (9) 
The abstraction is conservative in the sense that flow ( x Ee intflowl^ (X ). 
On the other hand, the longer is the time interval, the coarser is the abstraction. 
For this reason, we construct an abstraction of the flowpipe in terms of a union 
of convex approximations over a time partitioning. The abstract flowpipe over 


the time partitioning T' is 


intflow? (X) © U(intflow" (X) | [t, 4] e T). (10) 
Again, this is conservative w.r.t. the concrete flowpipe, i.e., for all time parti- 
tionings T it holds that flow (X) C intflow? (X). Moreover, it is conservative 


w.r.t. any refinement of T', i.e., the time partitioning U refines T' if UU — UT 
and V[u, u] € U: 3ft, 1] € T: [u,u] C [t,t], then intflow" (X) C intflow? (X). 


4.2 Hybrid Dynamics 


We embed the flowpipe abstraction routine into a reachability algorithm that 
accounts for the switching induced by the hybrid automaton. The discrete post 
operator is the image of a set Y C IR" through a switch e € E 


jump,(Y) = R,(Y N Ge) © {se}. (11) 


We explore the hybrid automaton constructing a set of abstract trajectories, 
namely sequences abstract states interleaved by time intervals and switches 


(vo, Xo) (to, to|(vo, Yo)eo(v1, Xi), fi] (v1, Yijei "me (12) 
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input : Template {D,,} and partitioning {T,} indexed by V 
output: Optionally an abstract trajectory (counterexample) 


m 


foreach v € V with nonempty Z, do 


2 push (v, Z,)[0, A] into the stack W; 

3 add the D,-polyhedron of Z, to Py; 

4 while W is not empty do 

5 pop ... (v, X)[t, t] from W; 

6 P — D,-polyhedron of X; 

7 if v is bad and PN I, is nonempty then // check counterexample 
8 return ...(v, X); 

9 foreach t* € {t+ 6,t+26,...,#} do // find completeness threshold 
10 if intflow’, (P) C P, then break; 

11 if t* — t and intflout (P) Z P, then // otherwise extend time horizon 
12 push ... (v, X)[t, t + A] into W; 

18 foreach [u,u] € T, and [u, u] n [t, t*] Æ Ø do // construct flowpipe 
14 Y — intflow“4"l ( p); 

15 foreach e € E with source v and destination v' do 

16 X’ — jump, (Y); 

17 if X’ C P,, then continue; 

18 push ... (v, X)[u, u](v, Y)e(v', X )[0, A] into W; 

19 add the D,;-polyhedron of X’ to P; 


Algorithm 1. Reachability procedure. 


where Xo, Yo,--- C IR" are nonempty sets of states that comply with tem- 
plate (D,) and partitioning (7,) in the following sense. First, Xo = Zo and 


X;,4 = jump,(Y;) for all i> 0. Second, Y; = intflow#"(P,) for all i > 0, where 
P; is the D;-polyhedron of X; and [t;, ti] € T;. The maximal set of abstract tra- 
jectories, the abstract semantics induced by {D,} and {T,,}, overapproximates 
the concrete semantics in the sense that every concrete trajectory (see Eq.3) 
has an abstract trajectory that subsumes it, i.e., modes and switches match, 
zi € Xi, t; € [t;, ti], and y; € Yi, for all i > 0. 

Computing the abstraction involves several difficulties. First, the trajectories 
might be not finitary. Indeed, this is unsolvable in theory, because the reachabil- 
ity problem is undecidable [21]. Second, the post operators are hard to compute. 
In particular, obtaining the sets X and Y in terms of conjunctions of linear 
inequalities in IR" requires eliminating quantifiers. In Algorithm 1, we present a 
procedure (which does not necessarily terminate) for tackling the first problem. 
In the next section, we show how to tackle the second using support functions. 

We employ Algorithm 1 to explore the tree of abstract trajectories. We store 
in the stack W the leaves to process ... (v, X), followed by a candidate interval 
[t,t]. For each leaf, we retrieve P, the template polyhedron of X. If it leads 
to a bad mode, we return, otherwise we search for a completeness threshold t* 
between f excluded and #, checking for inclusion in the union of visited polyhe- 
dra P,. In case of failure, we extend the time horizon of A and push the next 
candidate to the stack. Then, we partition the time between t and t*, construct 
the flowpipe, and process switching. Upon each successful switch, we augment 
P, with the D,;-polyhedron of the switching region X’, avoiding to store redun- 
dant polyhedra. Notably, the latter operation is efficient because all polyhedra 
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comply with the same template. For the same reason, we obtain efficient inclu- 
sion checks, which we implement by first computing the template polyhedron 
of the left hand side, and then comparing the constant terms of the respective 
linear inequalities. 

In conclusion, this reachability procedure that takes a template {D,,} and a 
partitioning {T,} and constructs a tree of reachable sets of states X and Y. It 
manipulates them through the post operators and overapproximate them into 
template polyhedra. In the next section, we discuss how to efficiently represent 
X and Y, so to efficiently compute their template polyhedra. In Sect. 6 we 
discuss how to discover appropriate {D,} and {T,}, so to eliminate spurious 
counterexamples. 


5 Space Abstraction Using Support Functions 


Abstracting away time left us with the task of representing the state space of the 
hybrid automaton, namely the space of its variable valuations. Such sets consists 
of polyhedra emerging from operations such as intersections, Minkowski sums, 
and linear maps with simple or interval matrices. In this section, we discuss 
how to represent precisely all sets emerging from any of these operations by 
means of their support functions (Sect. 5.1) and then how to abstract them into 
template polyhedra (Sect. 5.2). In the next section, we discuss how to refine the 
abstraction. 


5.1 Support Functions 


The support function of a closed convex set X C IR" in direction d € IR" consists 
of the maximizer scalar product of d over X 


px(d) = sup(d'z | x € X}, (13) 


and, indeed, uniquely represents any closed convex set [28]. Classic work on the 
verification of hybrid automata with affine dynamic have posed a framework for 
the construction of support functions from basic set operations, but under the 
assumption of unboundedness and nonemptiness of the represented set, and with 
approximated intersection [16]. Indeed, if the set is empty then its support func- 
tion is —oo, while if it is unbounded an d points toward a direction of recession is 
+oo, making the framework end up into undefined values. Such conditions turn 
out to be limiting in our context, first because we find desirable to represent 
unbounded sets so to accelerate the convergence to a fixpoint of the abstraction 
procedure, but most importantly because when encoding support functions for 
long abstract trajectories we might be not aware whether its concretization is 
infeasible. Checking this is a crucial element of a counterexample-guided abstrac- 
tion refinement routine. 

Recent work on the verification of hybrid automata with constant dynamics, 
ie., with flows defined by constraints on the derivative only, provides us with 
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a generalization of the classic support function framework which relaxes away 
the assumptions of boundedness and nonemptiness and yields precise intersec- 
tion [7]. The framework encodes combinations of convex sets of states into LP 
(linear programs) which enjoy strong duality with their support function. Simi- 
larly, we encode the support function in direction d of any set X into the LP 


ee T 
minimize c À (14) 
subject to AA = Bd, 
over the nonnegative vector of variables A. The LP is dual to px(d), which is 
to say that if the LP is infeasible then X is unbounded in direction d, and if 
the LP is unbounded then X is the empty set. Moreover, if the LP has bounded 
solution so does px(d) and the solutions coincide. 

The construction is inductive on operations between sets. For the base case, 
we recall that from duality of linear programming the support function of a 
polyhedron given by a system of inequalities Px < q is dual to the LP over 
A20 

minimize q'A 


subject to PTA = d. (15) 


Then, inductively, we assume that for the set X C IR" we are given an LP 
with the coefficients Ax, Bx, and cx, and similarly for the set Y C IR". For 
the support functions of X BY, M X, and X MY we respectively construct the 
following LP over the nonnegative vectors of variables A, p, a, and 6: 


minimize cĻA+ clu 

subject to Ax\ = Bxd and Ayu = Byd, 

minimize ab 

subject to Ax\ = Bx M? d, and 

minimize ckA c clu 

subject to Ax\ — Bx(a — B) = 0 and (18) 
Ayp + By(a — 8) — Byd. 


(16) 


(17) 


Such construction follows as a special case of [7], which we extend with the 
support function of a map through an interval matrix. 

The time abstraction of Sect. 4 additionally requires us to represent the map 
of sets of states through interval matrices. Precisely, we are given convex set of 
nonnegative values X C IR&o, the coefficients for the respective LP, an interval 
matrix |M, M] C IR"*", and we aim at computing the support function of all 
values in X mapped by all matrices in [M, M]. To this end, we define the LP 


minimize chA 
subject to Ax À + Bx(M"u-— M v) =0 and (19) 
=H +y = d, 


over the vectors A, u, and v of nonnegative variables. This linear program cor- 
responds to the the dual of the interval matrix map in Eq. 8. 
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5.2 Computing Template Polyhedra 


We represent all space abstractions X and Y in our procedure by their support 
functions. In particular, whenever set operations are applied, instead of solving 
the operation by removing quantifiers, we construct an LP. We delay solving it 
until we need to compute a template polyhedron. In that case, we compute the 
D-polyhedron of the set X by computing its support function in each of the 
directions in D, and constructing the intersection of halfspaces N{d' x < px(d) | 
dec D). 


6 Abstraction Refinement Using Space-Time Interpolants 


The reachability analysis of hybrid automata by means of the combination of 
interval arithmetic and support functions presented in Sects. 4 and 5 builds an 
overapproximation of the system dynamics. It is always sound for safety, but it 
may produce spurious counterexamples, due to an inherent lack of precision of 
the time abstraction and the polyhedral approximation. The level of precision 
is given by two factors, namely the choice of time partitioning and the choice 
of template directions, excluding the parameters for approximation of the expo- 
nential function, which we assume constant (see Sect. 4.1). In the following, we 
present a procedure to extract infeasibility proofs from spurious counterexam- 
ples. We produce them in the form of time partitions and bounding polyhedra, 
which we call space-time interpolants. Space-time interpolants can then be used 
to properly refine time partitioning and template directions. 

Consider the bounded path vo, eo, v1, €1,---;Uk,€k;Uk+1 Over the control 
graph and a sequence of dwell time intervals [t,, to], [t,, £1], ..-, [tp tk] emerging 
from an abstract trajectory. We aim at extracting a sequence Xo, Xj,...,Xx41 
of (possibly nonconvex) polyhedra and a sequence To, T3, ..., Tj of refinements 
of the respective dwell times such that Zo C Xo, jumpg o intflowz7* (Xo) € Xi, 

.., jump, o intflow;* (Xk) € Xk+1, and Xp41 N Ik+1 is empty. In other words, 
we want every X;+1 to contain all states that can enter mode vi+ı after dwelling 
on v; between t; and t; time, and the last to be separated from the invariant 
of mode vz+41. Containment is to hold inductively, namely X;,; has to contain 
what is reachable from X;, and the time refinements T are to be chosen in such 
a way that containment holds in the abstraction. Then, we call the sequence 
Xo, To, X41, T3,..., Xk, Ik, Xk41 a sequence of space-time interpolants for the 
path and the dwell times above. 

We compute a sequence of space-time interpolants by alternating multiple 
strategies. First, for the given sequence of dwell times, we attempt to extract a 
sequence of halfspace interpolants using linear programming (Sect. 6.1). In case 
of failure, we iteratively partition the dwell times in sets of smaller intervals, 
separating nonswitching from switching times and until every combination of 
intervals along the path admits halfspace interpolants (Sect. 6.2). We accumulate 
all halfspaces to form a sequence of unions of convex polyhedra that, together 
with the obtained time partitionings, will form a valid sequence of space-time 
interpolants. Finally, we refine the abstraction using the time partitionings and 
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the outwards pointing directions of all computed halfspaces, in order to eliminate 
the spurious counterexample (Sect. 6.3). 


6.1 Halfspace Interpolation 


Halfspace interpolants are the special case of space-time interpolants where every 
polyhedron in the sequence is defined by a single linear inequality [1]. Indeed, 
they are the simplest kind of space-time interpolants, and, for the same reason, 
the ones that best generalize the reachable states along the path. Unfortunately, 
not all paths admit halfspace interpolants, but, if one such sequence exists, then 
it can be extrapolated from the solution of a linear program. 

Consider a path vo,€9,...,vg41 with the respective dwell times [to, fo], ..., 
[ty tk]. A sequence of halfspace interpolants consists of a sequence of sets 
Ho,..., Hy41 among either any halfspace, or the empty set, or the universe, such 
that Zo € Ho, jumpy o intflowL ^! (Ho) C Hi, ..., jump; o intflow t= (Ep) C 
Hy441, and Hk+1OIk+1 is empty. In contrast with general space-time interpolants, 
every time partition consists of a single time interval and therefore the support 
function of every post operator jump o intflow] can be encoded into a single 
LP (see Sect. 5). We exploit the encoding for extracting halfspace interpolants, 
similarly to a recent interpolation technique for PCD systems [7]. 

We encode the support function in direction d of the closure of the image of 


the post operators along the path, i.e., the set jump, o intflow ^ telo.. -ojumpg o 
intflowlt"'"!(Z,), intersected with the invariant Iķ+1. We obtain the following 
LP over the free vectors ag, ..., 041 and the nonnegative vectors 3, 60,..-, Ôk, 
YO: - - -» Vk+1, M05 Hk; and Vo, ... Vk: 
a k 
minimize qz B + iso 01^ + 94,51 + sl agai) + qf, kl 
subject to PZ, 8 = ap, 
LT i 
M] m — M, vi = =0; for each i € [0..k], (20) 
—p4 tvi c Pl + PhO: = Ri ais for each i € [0..k], 
PE aed = —Qk41 +d, 


where every system of inequalities Px < q corresponds to the constraints of 
the respective init, guard, or invariant, every Riz + s; is an update equation, 
and every interval matrix [M;, M;] = intexp( A;, t;, ti). In general, one can check 
whether the closure is contained in a halfspace a! x < b by setting the direction 
to its linear term d = a and checking whether the objective function can equal its 
constant term b. In particular, we check for emptiness, which we pose as checking 
inclusion in 0r € —1. Therefore, we set d = 0 and the objective function to equal 
—1. Upon affirmative answer, from the solution aj, af, ...,v; we obtain a valid 
sequence of halfspace interpolants whose i-th linear term is given by až and i-th 


constant term is given by qz 6* + cedi, auae d; 6% + 5105 1). 
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input : sequence of intervals [uo, uo], .. . , [u;, u;] 
output: set of intervals 
1 b uj; 
2 while b < u; do 
3 a — b; 
4 b—b+e; 
5 c — uj; 
6 if [ug, uo], ..., [u 1; uj—1].[a, 6] does not admit halfspace interpolants then 
7 continue; 
8 if [ug, uo], ..., [u 4; uj—1].[a, c] admits halfspace interpolants then 
push [a, c] to the output; 
10 return; 
11 while c — b > € do 
12 if [uo , uo], .. [uj ,, uj-i] la, EL bre ] admits halfspace interpolants then 
13 | b — e| StS]; 
14 else 
15 L c — el]; 
16 push [a, b] to the output; 


Algorithm 2. Nonswitching time partitioning. 


6.2 Time Partitioning 


Halfspace interpolation attempts to compute a sequence of enclosures that are 
convex for a sequence of sets that are not necessarily convex. Specifically, it 
requires each halfspace to enclose the set of solutions of a linear differential 
equation, which is nonconvex, by enclosing its convex overapproximation along 
a whole time interval. As a result, large time intervals produce large overap- 
proximations, on which halfspace interpolation might be impossible. Likewise, 
shorter intervals produce tighter overapproximations, which are more likely to 
admit halfspace interpolants. In this section, we exploit such observation to 
enable interpolation over large time intervals. In particular, we properly parti- 
tion the time into smaller subintervals and we treat each of them as a halfspace 
interpolation problem. Later, we combine the results to refine the abstraction. 
Time partitioning is a delicate task in the whole abstraction refinement loop. 
In fact, while template refinement affects linearly the performance of the abstrac- 
tor, partitioning time intervals that can switch induces branching in the search, 
possibly leading to an exponential blowup. For this reason, we partition time by 
narrowing down the switching time, for incremental precision, until no more is 
left. In particular, we use Algorithm 2 to compute a set N of maximal intervals 
that admit halfspace interpolants, by enlarging or narrowing them of ¢ amounts. 
We embed this procedure in Algorithm 3 which, along the sequence, excludes 
the time in N, constructing a set of intervals S that overapproximate the switch- 
ing time. In particular, we construct the set with the widest possible intervals 
that are disjoint from N. Algorithm 3 succeeds when no more intervals are left, 
otherwise we half € and reapply it to the sequences that are left to process. 
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input : sequence of intervals [to, to],..-, [t,, tk] 
output: set of sequences of intervals 


1 push [tfo] to the queue Q; 

2 while Q is not empty do 

3 pop [uo uo]. . .., [u;, Tj] from Q; 

4 N — nonswitching time partitioning of [ug, U0], . - . , [u;, uj]; 
5 foreach [a, a] € N do 

6 push [ug, uo], ...; [uj .,, uj], {a, a] to the output; 

7 if j = k then 

8 assert [u;,U;]\UN = 0; 

9 continue; 

10 S — choose set of intervals that cover [u,;,uj]\U N; 

11 foreach [b,b] € S do 

12 | push [ug, uo], ... , [uj ,, Wz—1], [66], [£5 11, tj+1] to Q; 


Algorithm 3. Dwell time partitioning. 


6.3 Abstraction Refinement 


The procedures above construct sequences of time intervals [up, uo], . . . , [u;, u;] 
that are included in [to, fo],..., [£,,t%] and that, with the respective halfspace 
interpolants, this constitutes a proof of infeasibility for the counterexample. Yet, 
it does not form a sequence of space-time interpolants Xo, To, ..., Xy41. We form 
each partitioning T; by splitting [t;, t;] in such a way each element of T; is either 
contained in [u;, u;] or disjoint from it, for all intervals [u;,u;]. Then, we refine 
the partitioning of mode v; similarly. Each polyhedron X; is a union of convex 
polyhedra, each of which is the intersection of all halfspaces H; corresponding 
to some sequence |[ug, uo], . .. , [u;, Ui]. Nevertheless, to refine the abstraction we 
do not need to construct X;, but just to take the outward point directions of all 
H; and add them to the template of v;. 


7 Experimental Evaluation 


We implemented our method in C++ using GMP and Eigen for multiple pre- 
cision linear algebra, Arb for interval arithmetic, and PPL for linear program- 
ming [5, 23]. In particular, all libraries we are using are meant to provide guaran- 
teed solutions, as well as our implementation. We evaluate it on several instances 
of a filtered oscillator and a rod reactor, which are both parametric in the number 
of variables, and the latter in the number of modes too [15,35]. We record sev- 
eral statistics from every execution of our tool: the number #cex of counterex- 
amples found during the CEGAR loop, the number #dir of linearly indepen- 
dent directions and the average width of the time partitionings extracted from 
all space-time interpolants. Moreover, we independently measure three times. 
First, the time spent in finding counterexamples, namely the total time taken 
by inconclusive abstractions which returned a spurious counterexample. Second, 
the refinement time, that is the total time consumed by computing space-time 
interpolants. Finally, the verification time, that is the time spend in the last 
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abstraction of the CEGAR loop, which terminates with a fixpoint proving the 
system safe. We compare the outcome and the performance of our tool against 
Ariadne which, to the best of our knowledge, is the only verification tool available 
that is numerically sound and time-unbounded [11]. 


Table 1. Statistics for the benchmark examples (oot when > 1000s). 


: avg. | cex. ref. ver. tot. i 
ed Nom ka 2 width time time time time Ariadne 
filtosc ist ord 3 4 7 13 0.55 |0.57 0.96 0.13 1.66 | 27.56 
filtosc 2nd. ord 4 4 7 15 0.55 |0.83 1.78 0.20 2.81 | 150.7 
filtosc 3rd. ord 5 4 7 16 0.55 | 1.28 4.65 0.32 6.25 oot 
filtosc_4th_ord 6 4 7 18 0.55 | 1.53 11.39 0.37 13.29 oot 
filtosc_5th_ord irá 4 T 19 0.55 | 2.61 26.60 0.70 29.37 - 
filtosc 6th ord 8 4 7 18 0.55 |4.56 101.8 1.29 107.7 - 
filtosc_7th_ord 9 4 7 18 0.55 | 4.36 109.9 1.13 114.6 - 
filtosc_8th_ord 10 4 7 17 0.55 | 5.92 150.9 1.54 158.4 - 
filtosc_9th_ord 11 4 T 16 0.55 | 6.49 383.1 1.83 391.3 - 
filtosc_10th_ord| 12 4 7 17 0.55 |12.84 428.87 3.73 445.4 - 
filtosc 11th ord| 13 4 7 17 0.55 |15.10 525.2 4.38 544.6 - 
reactor. 1. rod 2 4 11 3 0.11 | 5.24 10.64 1.59 17.47 oot 
reactor. 2 rods 3 5 9 7 0.79 | 5.68 5.36 2.33 13.37] oot 
reactor_3_rods 4 6 12 13 1.07 |14.46 13.94 13.13 41.53 - 
reactor_4_rods 5 7 15 29 1.67 |45.50 42.47 111.5 199.9 z 
reactor 5 rods 6 8 16 31 1.81 |73.77 27.36 696.46 797.5 - 
'The filtered oscillator is hybrid automaton with four modes that smoothens 


a signal x into a signal z. It has k + 2 variables and a system of k + 2 affine 
ODE, where k is the order of the filter. Table 1 shows the results, for a scal- 
ing of k up to the 11-th order. The first observation is that the CEGAR loop 
behaves quite similarly on all scalings: number of counterexamples, number of 
directions, and time partitionings are almost identical. On the other hand, the 
computation times show a growth, particularly in the refinement phase which 
dominates over abstraction and verification. This suggests us that our procedure 
exploits efficiently the symmetries of the benchmark. In particular, time parti- 
tioning seems unaffected. What affects the performance is linear programming, 
whose size depends on the number of variables of the system. 

The rod reactor consists of a heating reactor tank and k rods each of which 
cools the tank for some amount of time, excluding each other. The hybrid 
automaton has one variable z for the temperature, k clock variables, one heat- 
ing mode, one error mode, and k cooling modes. If the temperature reaches 
a critical threshold and no rod can intervene, it goes into an error. For this 
benchmark, we start with a simple template, the interval around z, and we dis- 
cover further directions. Table 1 highlights two fundamental differences with the 
previous benchmark. First, the average width grows with the model size. This 
is because the heating mode requires finer time partitioning than the cooling 
modes. The cooling modes increase with the number of rods, and so does the 
average width over all time partitions. Second, while with the filtered oscillator 
the difficulty laid at interpolation, for the rod reactor interpolation is rather easy 
as well as finding counterexamples. Most of the time is spent in the verification 
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phase, where all fixpoint checks must be concluded, without being interrupted 
by a counterexample. This shows the advantage of our lazy approach, which first 
processes the counterexamples and finally proves the fixpoint. 

Our method outperforms Ariadne on all benchmarks. On the other hand, 
tools like Flow* and SpaceEx can be dramatically faster [9]. For instance, they 
analyze filtosc_8th_ord in resp. 9.1s and 0.36s (time horizon of 4 and jump 
depth of 10). This is hardly surprising, as our method has primarily been 
designed to comply with soundness and time-unboundedness, and pays the price 
for that. 


8 Related Work 


There is a rich literature on CEGAR approaches for hybrid automata, either 
abstracting to a purely discrete system [3,10,27,33,34] or to a hybrid automa- 
ton with simpler dynamics [22,30]. Both categories exploit the principle that the 
verification step is easier to carry out in the abstract domain. The abstraction 
entails a considerable loss of precision that can only be counteracted by increas- 
ing the number of abstract states. This leads to a state explosion that severely 
limits the applicability of such approaches. In contrast, our approach allows us 
to increase the precision by adding template directions, which does not increase 
the number of abstract states. The only case where we incur additional abstract 
states is when partitioning the time domain. This is a direct consequence of the 
nonconvexity of flowpipes of affine systems, and therefore seems to be unavoid- 
able when using convex sets in abstractions. In [26], the abstraction consists 
of removing selected ODE entirely. This reduces the complexity, but does not 
achieve any fine-tuning between accuracy and complexity. Template reachability 
has been shown to be very effective in both scaling up reachability tasks to more 
efficient successor computations [15,31,32] and achieving termination even over 
unbounded time horizons [12]. The drawback of templates is the lack of accuracy, 
which may lead to an approximation error that accumulates excessively. Efforts 
to dynamically refine templates have so far not scaled well for affine dynamics 
[14]. A single-step refinement was proposed in [4], but as was illustrated in [7], 
the refinement needs to be inductive in order to exclude counterexamples in a 
CEGAR scheme. 


9 Conclusion 


We have developed an abstraction refinement scheme that combines the effi- 
ciency and scalability of template reachability with just enough precision to 
exclude all detected paths to the bad states. At each iteration of the refine- 
ment loop, only one template direction is added per mode and time-step. This 
does not increase the number of abstract states. Additional abstract states are 
only introduced when required by the nonconvexity of flowpipes of affine sys- 
tems, a problem that we consider unavoidable. In contrast, existing CEGAR 
approaches for hybrid automata tend to suffer from state explosion, since refining 
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the abstraction immediately requires additional abstract states. As our experi- 
ments confirm, our approach results in templates over very low complexity and 
terminates with an unbounded proof of safety after a relatively small number of 
iterations. Further research is required to extend this work to nondeterministic 
and nonlinear dynamics. 
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Abstract. High-performance implementations of distributed and mul- 
ticore shared objects often guarantee only the weak consistency of their 
concurrent operations, foregoing the de-facto yet performance-restrictive 
consistency criterion of linearizability. While such weak consistency is 
often vital for achieving performance requirements, practical automa- 
tion for checking weak-consistency is lacking. In principle, algorithmi- 
cally checking the consistency of executions according to various weak- 
consistency criteria is hard: in addition to the enumeration of lineariza- 
tions of an execution's operations, such criteria generally demand the 
enumeration of possible visibility relations among the linearized opera- 
tions; a priori, both enumerations are exponential. 

In this work we identify an optimization to weak-consistency checking: 
rather than enumerating every possible visibility relation, it suffices to 
consider only the minimal visibility relations which adhere to the various 
constraints of the given criterion, for a significant class of consistency cri- 
teria. We demonstrate the soundness of this optimization, and describe 
an associated minimal-visibility consistency checking algorithm. Empir- 
ically, we show that our algorithm significantly outperforms the baseline 
weak-consistency checking algorithm, which naively enumerates all vis- 
ibilities, and adds only modest overhead to the baseline linearizability 
checking algorithm, which does not enumerate visibilities. 


Keywords: Linearizability - Consistency - Runtime verification 


1 Introduction 


Programming software applications that can deal with multiple clients at the 
same time, and possibly, with clients that connect at different sites in a network, 
relies on optimized concurrent or distributed objects which encapsulate lock- 
free shared memory access or message passing protocols into high-level abstract 
data types. Given the potentially-enormous amount of software that relies on 
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these objects, it is important to maintain precise specifications and ensure that 
implementations adhere to their specifications. 

One of the standard correctness criteria used in this context is linearizabil- 
ity (or strong consistency) [22], which ensures that the results of concurrently- 
executed invocations match the results of some serial execution of those same 
invocations. Ensuring such a criterion in a distributed context (when data is 
replicated at different sites in a network) is practically infeasible or even impos- 
sible [17,19]. Therefore, various weak consistency criteria have been proposed 
like eventual consistency [23,36], “session guarantees” like read-my-writes or 
monotonic-reads [35], causal consistency [25,28], etc. 

An axiomatic framework for formalizing such criteria has been proposed by 
Burckhardt et al. [9, 11]. Essentially, this extends the linearizability-based spec- 
ification methodology with a dynamic visibility relation among operations, in 
addition to the standard dynamic happens-before and linearization relations. 
Permitting weaker visibility relations models outcomes in which an operation 
may not observe the effects of concurrent operations that are linearized before 
it. 

In this work, we propose an online monitoring algorithm that checks whether 
an execution of a concurrent (or distributed) object satisfies a consistency model 
defined in this axiomatic framework. This algorithm constructs a linearization 
and visibility relation satisfying the axioms of the consistency model gradually 
as the execution extends with more operations. It is possible that the lineariza- 
tion and visibility constructed until some point in time are invalidated as more 
operations get executed, which requires the algorithm to backtrack and search 
for different candidates. This exponential blow-up is unavoidable since even the 
problem of checking linearizability is NP-hard in general [18]. 

The main difficulty in devising such an algorithm is coming up with effi- 
cient strategies for enumerating linearizations and visibility relations which min- 
imize the number of candidates needed to be explored and the number of times 
the algorithm has to backtrack. We build on previous works that propose such 
strategies for enumerating linearizations [29,38] in the context of linearizabil- 
ity checking. Roughly, the linearizations are extended iteratively by appending 
operations which are minimal in the happens-before order (among non-linearized 
operations). The choice of the minimal operations to append varies from one 
approach to the other. Our work focuses on combining such strategies with an 
efficient enumeration of visibility relations which are compatible with a given 
linearization. 

Rather than specializing our results to one single consistency model, we con- 
sider a general class of consistency models from Burckhardt et al.’s axiomatic 
framework [9,11] in which the visibility relation among operations is constrained 
to be contained in the linearization relation. That class includes, for instance, 
time-stamp based models employed in distributed object implementations, in 
which time stamps serve to resolve conflicts by effectively linearizing concurrent 
operations. We show that within this class of consistency models, it is not nec- 
essary to enumerate the set of all possible visibility relations (included in the 
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linearization) in order to check consistency of an execution. More precisely, we 
develop an algorithm for enumerating visibility relations that traverses oper- 
ations in linearization order and chooses for each operation o, a minimal set 
of operations visible to o that conforms to the consistency axioms (up to the 
linearization prefix that includes o). In general there may exist multiple such 
minimal sets of operations, and each of them must be explored. When the visi- 
bility relation cannot be extended, the algorithm needs to backtrack and choose 
different minimal visibility sets for previous operations. However, when all the 
minimal candidates have been explored, the algorithm can soundly report that 
the execution is not consistent, without resorting to the exploration of non- 
minimal visibility relations. 

Besides demonstrating the soundness of minimal-visibility consistency check- 
ing, we also demonstrate its empirical impact by applying our algorithm to con- 
current traces of Java concurrent data structures. We find that our algorithm 
consistently outperforms the baseline naive approach to enumerating visibilities, 
which considers also non-minimal visibility relations. Furthermore, we demon- 
strate that minimal-visibility checking adds only modest overhead (roughly 2x) 
to the baseline linearizability checking algorithm, which does not enumerate vis- 
ibilities. This suggests that small sets of minimal visibilities typically suffice in 
practice, and that the additional exponential enumeration of visibilities, atop 
the exponential enumeration of linearizations, may be avoidable in practice. 
Our implementation and experiments are open source, and publicly available 
on GitHub.! 

In summary, this work makes the following contributions: 


— we develop a new minimal-visibility consistency-checking algorithm for Bur- 
ckhardt et al.'s axiomatic consistency framework [9,11]; 

— we demonstrate the soundness of minimal-visibility consistency checking; and 

— we demonstrate an empirical evaluation comparing minimal-visibility consis- 
tency checking with the state-of-the-art consistency-checking algorithms. 


To the best of our knowledge, our algorithm is the first completely automatic 
algorithm for checking weak-consistency of arbitrary abstract data type imple- 
mentations which avoids the naive enumeration of all possible visibility relations. 

The rest of this paper is organized as follows. Section 2 elaborates a formal- 
ization of Burckhardt et al.’s axiomatic consistency framework [9,11], and Sect. 3 
develops a formal argument to the soundness of considering only minimal visi- 
bility relations. Section 4 describes our overall consistency checking algorithms, 
and Sect.5 describes our implementation and empirical evaluation. Section 6 
describes related work, and finally Sect. 7 concludes. 


2 Weak Consistency 


We describe a formal model for concurrent (distributed) object implementations. 
Clients interact with an object by making invocations from a set I and receiving 


! https: //github.com/michael-emmi/Vviolat /releases/tag /cav-2018-submission. 
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Fig. 1. A history h and an abstract execution containing h. 


returns from a set R (parameters of invocations, if any, are part of the invocation 
name). An operation is an invocation i € I paired with a return r € R; we 
denote such an operation by i — r. We denote individual operations by o. The 
invocation, resp., the return, in an operation o is denoted by inv(o), resp., ret(o). 

'The interaction between a client and an object is represented by a history 
(po, hb) over a set of operations O which consists of 


— a program (order) po which is a partial order on O, and 
— a happens-before (order) hb which is a partial order on O. 


'The program order is enforced by the client, e.g., by invoking a set of oper- 
ations within the same thread or process, while the happens-before order repre- 
sents the order in which the operations finished, i.e., (01,02) € Ab iff operation 
0; finished before oz started. We assume that the program order is included in 
the happens-before order. 


Example 1. Let us consider a key-value map ADT containing operations of the 
form put(key, value) = old, which insert key-value pairs and return previously- 
mapped values for the given keys, remove(key) — value, which remove key map- 
pings and return previously-mapped values, contains(value) — true/false, 
which test whether values are currently mapped, and get(key) = value, which 
return currently-mapped values for the given keys. Figure 1(a) pictures a history 
h where edges denote the program order po and happens-before hb. Such a his- 
tory can be obtained by a client with three threads each making two invocations 
(the invocations within the same thread are aligned vertically). 


'The axiomatic specifications of concurrent objects we consider are based on 
the following abstract representation of executions: an abstract execution over 
operations O is a tuple (po, hb, lin, vis) that consists of a history (po, hb) over O, 


— a linearization (order) lin? which is a total order on O, and 
— a visibility (relation) vis which is an acyclic relation on O. 


? The linearization is also called arbitration in previous works, e.g., [9]. 
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Intuitively, the visibility relation represents the inter-thread communication, how 
effects of operations are visible to other threads, while the linearization order 
models the “conflict resolution policy", how the effects of concurrent operations 
are ordered when they become visible to other threads. 

We say that an operation 0; such that (01,02) € vis is visible to o2, and that 
02 sees 01. Also, the set of operations visible to o» is called the visibility set of o2. 
The extensions of inv and ret to partial orders on O are defined component-wise 
as usual. 


Example 2. Figure 1(b) pictures an abstract execution containing the history in 
Fig. 1(a). The visibility relation is defined by the edges labeled vis together with 
their transitive closure. The linearization order is defined by the order in which 
operations are written (from top to bottom). 


A consistency criterion for concurrent objects is defined by a set of axioms 
over the relations in an abstract execution. These axioms relate abstract execu- 
tions to a sequential semantics of the operations, which is defined by a function 
Spec : I* x I — R that determines the return value of an invocation given the 
sequence of invocations previously executed on the object?. 


Example 3. The sequential semantics of the key-value map ADT considered 
in Example 1 is defined as expected. For instance, the return value of 
put(key, value) after a sequence of invocations c is the value null if ø con- 
tains no invocation put(key,...), or old if put(key, old) is the last invocation 
of the form put(key,...) in c. 


The domain dom(R) of a relation R is the set of elements x such that (x, y) € 
R for some y; the codomain codom(R) is the set of elements y such that (x, y) € R 
for some x. By an abuse of notation, if x is an individual element, x € R denotes 
the fact that x € dom(R) U codom(R). The (left) composition Rı o Ro of two 
binary relations R4, and Rọ is the set of pairs (z,2z) such that (x,y) € Rı and 
(y, z} € Ry for some y. We denote the identity binary relation ((x,x) : r € X} 
on a set X by [X], and we write [x] to denote {x£}. 

Return-value consistency [9], a variant of eventual consistency without live- 
ness guarantees, states that the return r of every operation 7 => r can be obtained 
from a sequential execution of i that follows the invocations visible to o (in the 
linearization order). This constraint will be formalized as an axiom called Ret. 
The visibility relation can be chosen arbitrarily. Standard “session guarantees" 
can be described in the same framework by adding constraints on the visibility 
relation: for instance, read my writes, i.e., operations previously executed in the 
same thread remain visible, can be stated as vis 2 po and monotonic reads, i.e., 
the set of visible operations to some thread grows monotonically over time, can 


3 Previous works have considered more general, concurrent semantics for operations. 
We restrict ourselves to sequential semantics in order to simplify the exposition. Our 
results extend easily to the general case. 
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(po, hb, lin, vis) E- Ret iff 
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Fig. 2. The grammar of con- Fig. 3. Consistency axiom satisfaction 
sistency axioms. for abstract executions. The satisfac- 
tion relation | is implicitly parame- 


terized by a sequential semantics Spec 
which we consider fixed. 


be stated as vis 2 vis o po. Then, a version of causal consistency [7,9], called 
causal convergence, is defined by the following set of axioms: 


vis D visovis vis 2 po lin 2 vis Ret 


which state that the visibility relation is transitive, it includes program order, 
and it is included in the linearization order. Finally, linearizability is defined by 
the set of axioms lin 2 hb, vis = lin, and Ret. 

To state our results in a general context that concerns multiple consistency 
criteria defined in the literature (including the ones mentioned above) and vari- 
ations there of, we consider a language of consistency axioms $ defined by the 
grammar in Fig.2. A consistency model 4 is a set ($1, ¢2,...} of consistency 
axioms. 

In the following, we assume that every consistency model is stronger than 
return-value consistency, and also, that the linearization order is consistent with 
the visibility and happens-before relations. The assumptions concerning the lin- 
earization order correspond to the fact that for instance, concurrent operations 
are ordered using timestamps that correspond to real-time. Formally, we assume 
that every consistency model contains the axioms 


Po = (Ret, lin D vis, lin 2 hb}. 


Figure 3 defines the precise semantics of consistency axioms on abstract exe- 
cutions: the context of an operation o according to a linearization lin and vis- 
ibility vis, denoted ctzt(lin, vis, o) is the restriction ([O;] o lin o [O;]) of lin to 
the operations O, = dom(vis o [o]) visible to o. For instance, for the abstract 
execution in Fig. 1(b), ctzt(lin, vis, contains(0) = false) is the sequence of 
operations put(1,0) = null; get(1) > 0; put(1,1) — 0. 

We extend this semantics to consistency models as e | 9 iff e — @ for all 
$ € 9 and to histories as: 


(po, hb) = 9 iff Alin, vis. (po, hb, lin, vis) E- o 


Example 4. The abstract execution in Fig.1(b) satisfies causal convergence: 
the visibility relation is transitive, it includes program order, and it is con- 
sistent with the linearization order. Moreover, the axiom Ret is also satisfied. 
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For instance, the invocation contains(0) returns exactly false when executed 
after put(1,0); get(1); put(1, 1). Similarly, it returns true when executed after 
put(1,0); get(1); put(0, 0). 


3 Minimal Visibility Extensions 


Checking whether a given history satisfies a consistency model is intractable 
in general. This essentially follows from the fact that checking linearizability 
is NP-hard in general [18]. While the main issue in checking linearizability is 
enumerating the exponentially many linearizations, checking weaker criteria like 
causal convergence requires also an enumeration of the exponentially many visi- 
bility relations (included in a given linearization). We prove in this section that 
it is enough to enumerate only minimal visibility relations (w.r.t. set inclusion), 
included in a given linearization, in order to conclude whether a given history 
and linearization satisfy a consistency model. 

A linearized history o = (po, hb, lin) consists of a history and a linearization 
lin such that lin D hb. The extension of — to linearized histories is defined as: 


(po, hb, lin) E- d$ iff Avis. (po, hb, lin, vis) E- o 


The i-th element of a sequence s is denoted by s[i] and the prefix of s of 
length i is denoted by si. The projection of a linearized history c = (po, hb, lin) 
to a prefix lin; of lin is denoted by o;. Formally, O; = dom(lin;) U codom(lin;) 
and Oi = (po N (O; x Oi), hb N (O; x Oj), lin;). 

For a linearized history (po, hb, lin) and a consistency model 4, a visibility 
relation vis; on operations from a prefix lin; of lin is called -extensible when 
there exists a visibility relation vis D vis; such that (po, hb, lin, vis) = ®. The 
relation vis is called a ®-extension of vis; up to lin. By extrapolation, a ®- 
extension of vis; up to linj is a visibility relation vis; such that (o;,vis;) = ®, 
for any i < j. Such an extension is called minimal when for every other ®- 
extension vis; of vis; up to linj, we have that vis; Z vis;. 


Example 5. Consider again the abstract execution in Fig.1(b). Ignoring the 
edges labeled by vis, it becomes a linearized history ø. The prefix c2 contains just 
the two operations put(1,0) = null and get(1) => 0. For causal convergence, 
the visibility relation visg = {(put(1,0) 2 null,get(1) = 0)} on operations of 
dg is extensible, as witnessed by the visibility relation defined for the rest of the 
operations in this execution. The visibility relation 


visa ={(put(1,0) > null, get(1) > 0), (put(1,0) > null,put(0,0) => null), 
(get(1) = 0, put(0,0) 2 nu11)) 


is an extension of visa up to ling, and contains the operations in a2 together with 
put(0,0) = null. Note that this extension is not minimal. A minimal extension 
would be exactly equal to visz since, intuitively, put(0,0) = null is not required 
to observe operations on keys other than 0. 
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The next lemma shows that minimizing the visibility sets of operations in 
a linearization prefix, while preserving the truth of the axioms on that prefix, 
doesn’t exclude visibility choices for future operations (occurring beyond that 
prefix). In more precise terms, the -extensibility status is not affected by choos- 
ing smaller visibility sets for operations in a linearization prefix. For instance, 
since the visibility visa in Example 5 is extensible (for causal convergence), the 
smaller visibility relation in which put(0,0) = null doesn’t see any operation, 
is also extensible. This result relies on the specific form of the axioms, which 
ensure that smaller visibility sets impose fewer constraints on the visibility sets 
of future operations. For instance, the axiom vis 2 vis o vis enforces that vis 
contains ((0,02) : (0,01) € vis) whenever a pair (01,02) is added to vis. Mini- 
mizing the visibility set of o1 will minimize the set of operations that must be 
seen by 02, thus making the choice of the operations visible to 02 more liberal. 


Lemma 1. For every linearized history o and consistency model ®, if 


(ci, visi) = B, ^ vis; is B-extensible, — (ci,vis;) | P, ^ and vis; C visi, 
then vis; is ®-extensible. 


Proof (Sketch). We show that the -extension vis of vis; up to lin can be trans- 
formed to a -extension of vis; up to lin by simply removing the pairs of opera- 
tions in vis; \ vis;. Let vis’ be this visibility relation and 9 a consistency model. 
We prove that (po, hb, lin, vis’) = ® by considering the different types of axioms 
defined in Fig. 2. 

Suppose that ® contains an axiom of the form vis 2 rel (according to the 
notations in Fig. 2). We have that vis; D (rel[po/po][hb/hb]|[lin /lin] vis" /vis])o[O;] 
by the hypothesis (from (o;, vis;) = 9). Then, vis; C vis; implies that 


(rel[po / po] [ho /hb] [Lin /lin] vis / vis]) o [O V O;] 
> (rel[po/po][hb /hb] [lin /lin] [vis' /vis]) o [O V Oj] 


which together with vis’ o [D V O;] = vis o [O \ O;] (the visibility relations vis 
and vis’ are the same for operations which are not included in the prefix lin;) 
implies that 


vis’ o [ON Oj] 2 (rel[po/po][hb/hb] [lin /lin] |vis" /vis]) o [O \ Oj]. 


Therefore, (po, hb, lin, vis") | vis D rel. 

The axiom Ret relates the return value of each operation o in o to the set of 
operations visible to o. This relation is insensitive to the set of operations seen 
by an operation before o in the linearization order. Therefore, (po, hb, lin, vis") = 
Ret is an immediate consequence of (o;, vis;?) H Ret and the fact that vis and 
vis’ are the same for operations which are not included in the prefix lin;. 

The axioms of the form lin 2 rel (according to the notations in Fig.2) are 
straightforward implications of lin 2 hb and lin 2 vis, which are assumed to be 
included in any consistency model. They hold for any linearized history. 
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'The main result of this section shows that a visibility enumeration strategy 
that considers operations in the linearization order and computes minimal exten- 
sions iteratively, possibly backtracking to another choice of minimal extension 
if necessary, is complete in general (it finds a visibility relation satisfying the 
consistency axioms © iff the input linearized history satisfies 9). Backtracking 
is necessary since in general, there may exist multiple minimal extensions and 
all of them should be explored. For a given linearized history c and visibility 
relation vis on operations of c, vis; — vis o [O;] denotes the restriction of vis to 
operations from the prefix lin;. 


Theorem 1. For every linearized history o and consistency model ®, o = ® iff 
there exists a visibility relation vis such that 


for every i, visi} is a minimal Q-extension of vis; up to lini ji. 


Proof. (Sketch) Let c be a linearized history such that c = 9. Therefore, there 
exists a visibility relation vis such that (c,vis) = P. We prove by induction 
that there exists a visibility relation vis’ satisfying the claim of the theorem. 
Assume that there exists a -extensible visibility relation vis? on operations in 
lin; which satisfies the claim of the theorem for every i « j (we take vis? — vis). 
Let vis^'! be a minimal visibility relation on operations in linj+ı such that 
vis) *! o [O;] = vis? o [Oj] and (c541, visit!) E- 9 (such a set exists because vis? 
is d-extensible). By Lemma 1, vis?*! is d-extensible. Also, vis^*! satisfies the 
claim of the theorem for every i « j 4- 1. The reverse direction is trivial. 


Example 6. In the context of the abstract execution in Fig. 1(b), the visibility 
relation defined by removing the vis edge ending in put(0, 0) = nu11, and adding 
the transitive closure, satisfies the requirements in Theorem 1. 


4 Efficient Monitoring of Consistency Models 


We describe an algorithm for checking whether a given history satisfies a con- 
sistency model, which combines linearization enumeration strategies proposed 
in [29,38] with the visibility enumeration strategy proposed in Sect. 3. 

The algorithm is defined by the procedure checkConsistency listed in Fig. 4. 
'This recursive procedure searches for extensions of the input linearization and 
visibility (initially, checkConsistency will be called with lin = vis = Ø) which 
witness that the input history h satisfies ®. It assumes that the inputs lin and vis 
satisfy the axioms of the consistency model when the input history is projected 
on the linearized operations (the operations in lin). This projection is denoted 
by hun. Formally, the precondition of this procedure is that (hun, lin, vis) E- d. 

The extensions of lin and vis are built in successive steps. At each step, the 
linearization is extended according to the procedure linExtensions and the 
visibility according to the procedure visExtensions. 

The abstract implementation of linExtensions, presented in Fig. 4, chooses 
a set of non-linearized operations O which are minimal among non-linearized 
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proc checkConsistency(h,®, lin, vis) { 

if (isComplete(h,lin)) then 
return true; 

forall lin’ of linExtensions(h,lin) do 
forall vis of visExtensions(h, lin’, vis) do 

if checkConsistency (h,®, lin’, vis’) then 
return true; 
return false; 


} 
proc linExtensions(h,lin) { 
let O = minimals(h, lin); proc visExtensions (h, lin, vis) { 
forall O' of subsets(O) forall vis’ a minimal -extension 
forall seq of linearizations (O’) of vis up to lin 
let lin’ = append (lin, seq); yield vis’; 
yield lin’; } 


} 


Fig. 4. Checking consistency of a history. The procedures linExtensions, resp., 
visExtensions return the set of linearizations, resp., visibilities, produced by the 
instruction yield. 


operations w.r.t. happens-before, i.e., returned by minimals(h, lin), and appends 
any linearization of the operations in O to the input linearization lin. Formally, 
O C {0:0 ¢ lin and Vo'. o' d lin = —o' < o}, where < denotes the happens- 
before relation. The fact that the operations in O are minimal among non- 
linearized operations ensures that the returned linearizations are consistent with 
the happens-before order. 

'Two linearization enumeration strategies proposed in the literature can be 
seen as instances of linExtensions. The strategy in [38] corresponds to the case 
where O contains exactly one minimal operation. For instance, for the history in 
Fig. 1(a), this strategy will start by picking a minimal element in the happens- 
before relation, say put(1,0) = null, then, a minimal operation among the rest, 
say get(1) — 0, and so on. 

The strategy proposed in [29] is slightly more involved (and according to 
experimental results, more efficient), but it relies on a presentation of histories h 
as sequences of call and return actions (an operation spanning the time interval 
between its call and return action). The happens-before order is extracted as 
usual: an operation o1 happens before an operation o» if its return occurs before 
the call of o9. This strategy defines O as the first non-linearized operation o 
that returned in h together with a set of non-linearized operations O' that are 
concurrent with o (i.e., are not ordered after o in the happens-before order). The 
operation o is linearized last in the returned extensions. For instance, consider the 
history h in Fig. 5 represented as a sequence of call/return actions (small boxes 
at the begin, resp., end, of an interval denote call actions, resp., return actions). 
The first linearization extension (when lin = ()) includes put(1,0) = null (the 
first operation to return) after some sequence of operations concurrent with it, for 
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put(1,0) > zi put(0,0) — null 


get(1) — 0 


contains(0) — false 


t(1,1)0 
fe (0) contains(0) — true 


Fig. 5. The history h in Fig. 1 presented as a sequence of call/return actions. 


instance the empty sequence. Next, the current linearization put(1,0) — null 
can be extended by adding put(0,0) — null (the first operation to return, 
if we exclude put(1,0) — null which is already linearized) and possibly 
get(l) = 0 before it. Suppose that we choose put(1,0) = null;get(l) => 
0; put(0, 0) = null. Then, the extension will include put(1,1) = 0 and possibly 
contains(0) = true or contains(0) = false, and so on. Compared to the 
previous strategy, an extension step can add multiple operations. 

The extensions of the visibility relation (returned by visExtensions) are 
minimal -extensions of vis up to the input linearization. They can be con- 
structed iteratively by considering the newly linearized operations one by one 
and each time compute a minimal extension of the visibility. For instance, the 
linearization construction explained in the previous paragraph can be expanded 
with a visibility enumeration as follows: 


— lin = put(1,0) = null: the minimal visibility is visi = 0, 
— lin = put(1,0) > null; get(1) > 0; put(0,0) = null: the minimal visibility 
is vis2 = {(put(1,0) > null, get(1) = 0)}, and so on. 


The procedure checkConsistency backtracks to a different extension when 
the current one cannot be completed to include all the operations in the input 
history (checked by the recursive call). The correctness of the algorithm is stated 
in the following theorem. 


Theorem 2. checkConsistency(h,®,0,@) returns true iff h = ©. 


5 Empirical Results 


While our minimal-visibility consistency checking algorithm is applicable to 
a wide class of distributed and multicore shared object implementations, 
here we demonstrate its efficacy on histories recorded from executions of 
Java Development Kit (JDK) Standard Edition concurrent data structures. 
Recent work demonstrates that JDK concurrent data structures regularly admit 
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non-atomic behaviors, often by design [14]; these weakly-consistent behav- 
iors span many methods of the java.util.concurrent package, including the 
ConcurrentHashMap, ConcurrentSkipListMap, ConcurrentSkipListSet, Concur- 
rentLinkedQueue, and the ConcurrentLinkedDeque, for instance, including the 
contains method described in Example 3. 

We extracted 4,000 randomly-sampled histories from approximately 8,000 
observed over approximately 1,000,000 executions in stress testing 20 randomly- 
generated client programs of the ConcurrentSkipListMap with up to 15 invo- 
cations across up to 3 threads. In each program, the given number of threads 
invokes its share of randomly-generated methods with randomly-generated val- 
ues. We consider random generation superior to collecting programs in the wild, 
since found client programs can mask inconsistencies by restricting method argu- 
ment values, or by being agnostic to inconsistent return values. Furthermore, 
automated generation gives us the ability to evaluate our algorithm on unbiased 
sample sets, and avoid any technical problems in the collection of programs; it 
also allows us to test method combinations which might not appear in publicly- 
available examples. 

We subject each client program to 1s of stress testing^ to record histories. 
The return value of each invocation is stored in a different thread-local vari- 
able which is read at the end of the execution. Recording the happens-before 
order between invocations without affecting implementation behavior signifi- 
cantly (e.g., without influencing the memory orderings between shared-memory 
accesses) is challenging. For instance, we found the use of high-precision timers to 
be unsuitable, since the response-time of System.nanoTime calls is much higher 
than calls to the implementations under test; invoking such timers between each 
invocation of implementation methods would prevent implementation methods 
from overlapping in time, and thus hide any possible inconsistent behaviors. Sim- 
ilarly, the use of atomic operations and volatile variables would impose additional 
synchronization constraints and prevent many weak-memory reorderings. 

Essentially, our solution is to introduce a shared variable per thread storing 
its program counter — in our context, the program counter stores the number 
of call and return events thus far executed. A thread's program counter is read 
by every other thread before and after each invocation. Figure 6 demonstrates a 
simplified version? of our encoding for a program with two threads each invok- 
ing two methods. The program counter variables pcO and pci are not declared 
volatile, which, in principle, provides stronger guarantees concerning the derived 
happens-before relation; such declarations would interfere with implementation 
weak-memory effects. The program counter values read by each thread allows 


^ For stress testing we leverage OpenJDK's JCStress tool: http:/ /openjdk.java.net/ 
projects/code-tools/jcstress/. 

5 In our actual implementation, each program-counter access is encapsulated within a 
method call in order to avoid compiler reordering between the reads of other threads’ 
counters and the increment of one's own. While the Java memory model does not 
guarantee that such encapsulation will prevent reordering, we found this solution to 
be adequate on Oracle's Java SE runtime version 9. Our actual implementation also 
wraps invocations in try-catch blocks to deal with exceptions. 
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int pcO = 0, pci = 0; 
ConcurrentHashMap obj = new ConcurrentHashMap() ; 


void threadO() { void threadi() { 
Object rO, r1; Object r0, ri; 
int pcs[][] = new int[4] [1]; int pcs[][] = new int[4][1]; 
int n = 0; int n = 0; 
// first invocation // first invocation 
pes[n] [0] = pci; n++; pcO++; pes[n] [0] = pcO; ne*; pcie*; 
r0 = obj.elementsO ; rO = obj.remove(1); 
pes[n] [0] = pci; n**; pcO++; pes[n] [0] = pcO; n++; pcl++; 
// second invocation // second invocation 
pes[n] [0] = pc1; ne*; pcOt++; pes[n] [0] = pcO; ne*; pcie*; 
ri = obj.put(1,0); ri = obj.put(0,1); 
pes[n] [0] = pci; n**; pcO-**; pes[n] [0] = pcO; n++; pci**; 
// store the values of r0, r1, pcs // store the values of r0, r1, pcs 
} } 


Fig. 6. Our encoding for recording ConcurrentHashMap histories. Each thread’s pro- 
gram counter is read before and after other threads’ invocations, and incremented sub- 
sequent to each such read. The two-dimensional pcs[n] [m] array stores n program 
counter values for m neighboring threads. 


us to extract a happens-before order between invocations which is sound in the 
sense that the actual happens-before may order more operations, but not fewer 
— assuming that shared-memory accesses satisfy at least the total-store order 
(TSO) semantics in which writes are guaranteed to be performed according to 
program order. For instance, when pcs[0][0] > 2 in the second thread (thread1), 
the first invocation in the other thread (thread0) happens-before the first invo- 
cation in this thread. Otherwise, if pcs[0][0] < 2, then the two invocations are 
overlapping in time. The latter may not be true in the real happens-before due to 
the delay in incrementing and reading the program counter variables. Although 
some loss of precision is possible, we are unaware of other methods for track- 
ing happens-before which avoid significant interference with the implementation 
under test. 

Based on the encoding described above, we generate histories as sequences 
of call and return actions which serve as input to our consistency checking algo- 
rithms. For simplicity, we have considered just two consistency models, lineariz- 
ability and a weak consistency model defined by (Ret, lin D vis, lin D hb, vis 2 hb} 
— see Sect.2. We consider linearizability in order to measure the overhead of 
checking weak consistency due to visibility enumeration; the second model is 
simply the easiest weak-consistency model to support with our implementation; 
the choice among possible weak-consistency models appears fairly arbitrary, since 
the enumeration of visibility relations is common to all. 

We consider several measurements, the results of which are listed in Figs. 7 
and 8; all times are measured in milliseconds on logarithmic scale on a 2.7 GHz 
Intel Core i5 MacBook Pro with Oracle-s Java SE runtime version 9; and 
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Fig. 7. Empirical comparison of (left) standard linearizability checking versus just-in- 
time linearizability checking on concurrent traces of Java data structures; and (right) 
weak-consistency checking versus standard linearizability checking. Each point reflects 
the time in milliseconds for checking a given trace. 


timeouts are set to 1000 ms. We note that while accurate and recording of oper- 
ation timings within an execution without interference is challenging, timing the 
validation of each recorded history, which we report here, is accomplished accu- 
rately, without interference, by computing the clock difference just before and 
after validation. 

Our first measurements establish the baseline linearizability and weak- 
consistency checking algorithms. On the left side of Fig. 7 we consider the time 
required to check linearizability for each history by our own implementations 
of Wing and Gong's standard enumerative approach [38], along with Lowe's 
“just-in-time linearizability" algorithm [29] — see Sect. 4. We resolve the non- 
determinism in these algorithms (e.g., in choosing which pending operation to 
attempt linearizing first) arbitrarily (e.g., first called), finding no clear winner: 
each algorithm performs better on some histories. Since these subtleties are out- 
side the scope of our work, we avoid further investigation and choose Wing and 
Gong's algorithm as our baseline linearizability-checking algorithm. 

Our second measurement exposes the overhead of enumerating visibility 
relations for checking weak consistency. On the right side of Fig.7 we con- 
sider the time required to check weak consistency of a given history versus the 
time required to check its linearizability. We observe an overhead of approxi- 
mately 10x due to visibility enumeration and validation. Our naive implemen- 
tation enumerates candidate visibilities in size-decreasing order since we expect 
visibility-loss to be the exception rather than the rule; for instance, atomic opera- 
tions observe all linearized-before operations. We omit the analogous comparison 
between weak-consistency checking and just-in-time linearizability checking to 
avoid redundancy, since the just-in-time optimization is a seemingly-insignificant 
factor in our experiments: the results are nearly identical. 


ê Due to a benign error in the decoding of results of stress testing, we observe one 
single point on which the two algorithms conflict — labeled by *Unknown.". 
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Fig.8. Empirical comparison of (left) standard weak-consistency checking versus 
minimal-visibility weak-consistency checking on concurrent traces of Java data struc- 
tures; and (right) the latter versus standard linearizability checking. Each point reflects 
the time in milliseconds for checking a given trace. 


Our third measurement demonstrates the impact of our minimal-visibility 
consistency checking optimization. On the left side of Fig.8 we consider the 
time required to check weak consistency without and with our optimization. The 
difference is dramatic, with our optimized algorithm consistently outperforming, 
sometimes up to multiple orders of magnitude: the leftmost 1000 ms timeout 
of the naive algorithm is matched by a roughly 18 ms positive identification. 
Finally, our fourth measurement, on the right side of Fig. 8, demonstrates that 
the overhead of our minimal-visibility checking algorithm over linearizability 
checking is quite modest: we observe roughly a 2x overhead, compared with the 
observed 10x overhead without optimization. 

While our experiments clearly demonstrate the efficacy of our minimal- 
visibility consistency checking algorithm, we will continue to evaluate this opti- 
mization across a wide range of concurrent objects, consistency models, and 
client programs, e.g., including many more concurrent threads. While we do 
expect the performance of linearizability- and weak-consistency checking to vary 
with thread count, we expect the performance gains of minimal-visibility consis- 
tency checking to continue to hold. 


6 Related Work 


Herlihy and Wing [22] described linearizability, which is the standard consistency 
criterion for shared-memory concurrent objects. Motivated by replication-based 
distributed systems, Burckhardt et al. [9,11] describe a more general axiomatic 
framework for specifying weaker consistencies like eventual consistency [36] and 
causal consistency [2]. Our weak consistency checking algorithm applies to con- 
sistency models described in this framework. 

While several static techniques have been developed to prove linearizabil- 
ity [1,4,6, 12, 13,21, 22,24, 26, 27, 30-34, 37,39], few have addressed dynamic tech- 
niques such as testing and runtime verification. The works in [29,38] describe 
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monitors for checking linearizability that construct linearizations of a given his- 
tory incrementally, in an online fashion. Line-Up [10] performs systematic con- 
currency testing via schedule enumeration, and offline linearizability checking 
via linearization enumeration. Our weak consistency checking algorithm com- 
bines these approaches with an efficient enumeration of visibility relations. The 
works in [15,16] propose a symbolic enumeration of linearizations based on a 
SAT solver. Although more efficient in practice, this approach applies only to 
certain ADTs. In this work, we propose a generic approach that assumes no 
constraints on the sequential semantics of the concurrent objects. 

Bouajjani et al. [7] consider the problem of verifying causal consistency. They 
propose an algorithm for checking whether a given execution satisfies causal 
consistency, but only for the key-value map ADT with simple put and get 
operations. Our work proposes a generic algorithm that can deal with various 
weak consistency criteria and ADTs. 

From the complexity standpoint, Gibbons and Korach [18] showed that mon- 
itoring even the single-value register type for linearizability is NP-hard. Alur 
et al. [3] showed that checking linearizability of all executions of a given imple- 
mentation is in EXPSPACE when the number of concurrent operations is bounded, 
and then Hamza [20] established EXPSPACE-completeness. Bouajjani et al. [5] 
showed that the problem becomes undecidable once the number of concurrent 
operations is unbounded. Also, Bouajjani et al. [7,8] investigate various ADTs 
for which the problems of checking eventual and causal consistency are decidable. 


7 Conclusion 


We have developed the first completely-automatic algorithm for checking weak 
consistency of arbitrary concurrent object implementations which avoids the 
naive enumeration of all possible visibility relations. While methodologies for 
constructing reliable yet weakly-consistent implementations are relatively imma- 
ture, we believe that such implementations will continue to be important for the 
development of distributed and multicore software systems. Likewise, automa- 
tion for testing and verifying such implementations is, and will increasingly be, 
important. Besides improving state-of-the-art verification algorithms, our results 
represent an important step for future research which may find other ways to 
exploit the soundness of considering only minimal visibilities, on which our opti- 
mized algorithm relies. 


References 


1. Abdulla, P.A., Haziza, F., Holík, L., Jonsson, B., Rezine, A.: An integrated speci- 
fication and verification technique for highly concurrent data structures. In: Piter- 
man, N., Smolka, S.A. (eds.) TACAS 2013. LNCS, vol. 7795, pp. 324-338. Springer, 
Heidelberg (2013). https://doi.org/10.1007/978-3-642-36742-7_23 

2. Ahamad, M., Neiger, G., Burns, J.E., Kohli, P., Hutto, P.W.: Causal memory: def- 
initions, implementation, and programming. Distrib. Comput. 9(1), 37-49 (1995). 
https://doi.org/10.1007/BF01784241 


10. 


11. 


12. 


13. 


14. 


Monitoring Weak Consistency 503 


Alur, R., McMillan, K.L., Peled, D.A.: Model-checking of correctness conditions 
for concurrent objects. Inf. Comput. 160(1—2), 167-188 (2000). https://doi.org/ 
10.1006 /inco.1999.2847 

Amit, D., Rinetzky, N., Reps, T., Sagiv, M., Yahav, E.: Comparison under abstrac- 
tion for verifying linearizability. In: Damm, W., Hermanns, H. (eds.) CAV 2007. 
LNCS, vol. 4590, pp. 477-490. Springer, Heidelberg (2007). https://doi.org/10. 
1007/978-3-540-73368-3.49 

Bouajjani, A., Emmi, M., Enea, C., Hamza, J.: Verifying concurrent programs 
against sequential specifications. In: Felleisen, M., Gardner, P. (eds.) ESOP 2013. 
LNCS, vol. 7792, pp. 290-309. Springer, Heidelberg (2013). https://doi.org/10. 
1007 /978-3-642-37036-6_17 

Bouajjani, A., Emmi, M., Enea, C., Hamza, J.: Tractable refinement checking 
for concurrent objects. In: Rajamani, S.K., Walker, D. (eds.) Proceedings of the 
42nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming 
Languages, POPL 2015, 15-17 January 2015, Mumbai, India, pp. 651-662. ACM 
(2015). https://doi.org/10.1145/2676726.2677002 

Bouajjani, A., Enea, C., Guerraoui, R., Hamza, J.: On verifying causal consistency. 
In: Castagna, G., Gordon, A.D. (eds.) Proceedings of the 44th ACM SIGPLAN 
Symposium on Principles of Programming Languages, POPL 2017, 18-20 January 
2017, Paris, France, pp. 626-638. ACM (2017). http://dl.acm.org/citation.cfm? 
id=3009888 

Bouajjani, A., Enea, C., Hamza, J.: Verifying eventual consistency of optimistic 
replication systems. In: Jagannathan, S., Sewell, P. (eds.) The 41st Annual ACM 
SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 
2014, 20-21 January 2014, San Diego, CA, USA, pp. 285-296. ACM (2014). 
https: //doi.org/10.1145/2535838.2535877 

Burckhardt, S.: Principles of eventual consistency. Found. Trends Program. Lang. 
1(1-2), 1-150 (2014). https: //doi.org/10.1561/2500000011 

Burckhardt, S., Dern, C., Musuvathi, M., Tan, R.: Line-up: a complete and auto- 
matic linearizability checker. In: Zorn, B.G., Aiken, A. (eds.) Proceedings of the 
2010 ACM SIGPLAN Conference on Programming Language Design and Imple- 
mentation, PLDI 2010, 5-10 June 2010, Toronto, Ontario, Canada, pp. 330—340. 
ACM (2010). https://doi.org/10.1145/1806596.1806634 

Burckhardt, S., Gotsman, A., Yang, H., Zawirski, M.: Replicated data types: spec- 
ification, verification, optimality. In: Jagannathan, S., Sewell, P. (eds.) The 41st 
Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Lan- 
guages, POPL 2014, 20-21 January 2014, San Diego, CA, USA, pp. 271-284. ACM 
(2014). https://doi.org/10.1145/2535838.2535848 

Dodds, M., Haas, A., Kirsch, C.M.: A scalable, correct time-stamped stack. 
In: Rajamani, S.K., Walker, D. (eds.) Proceedings of the 42nd Annual ACM 
SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 
2015, 15-17 January 2015, Mumbai, India, pp. 233-246. ACM (2015). https://doi. 
org/10.1145/2676726.2676963 

Drágoi, C., Gupta, A., Henzinger, T.A.: Automatic linearizability proofs of con- 
current objects with cooperating updates. In: Sharygina, N., Veith, H. (eds.) CAV 
2013. LNCS, vol. 8044, pp. 174-190. Springer, Heidelberg (2013). https://doi.org/ 
10.1007/978-3-642-39799-8 11 

Emmi, M., Enea, C.: Exposing non-atomic methods of concurrent objects. CoRR 
abs/1706.09305 (2017). http://arxiv.org/abs/1706.09305 


504 


15. 


16. 


17. 


18. 


19. 


20. 


21. 


22. 


23. 


24. 


25. 


26. 


27. 


28. 


29. 


M. Emmi and C. Enea 


Emmi, M., Enea, C.: Sound, complete, and tractable linearizability monitoring for 
concurrent collections. PACMPL 2(POPL), 25:1-25:27 (2018). https://doi.org/10. 
1145/3158113 

Emmi, M., Enea, C., Hamza, J.: Monitoring refinement via symbolic reasoning. 
In: Grove, D., Blackburn, S. (eds.) Proceedings of the 36th ACM SIGPLAN Con- 
ference on Programming Language Design and Implementation, 15-17 June 2015, 
Portland, OR, USA, pp. 260-269. ACM (2015). https://doi.org/10.1145/2737924. 
27731983 

Fischer, M.J., Lynch, N.A., Paterson, M.: Impossibility of distributed consensus 
with one faulty process. J. ACM 32(2), 374—382 (1985). https://doi.org/10.1145/ 
3149.214121 

Gibbons, P.B., Korach, E.: Testing shared memories. SIAM J. Comput. 26(4), 
1208-1244 (1997). https://doi.org/10.1137/S0097539794279614 

Gilbert, S., Lynch, N.A.: Brewer’s conjecture and the feasibility of consistent, avail- 
able, partition-tolerant web services. SIGACT News 33(2), 51-59 (2002). https:// 
doi.org/10.1145/564585.564601 

Hamza, J.: On the complexity of linearizability. In: Bouajjani, A., Fauconnier, 
H. (eds.) NETYS 2015. LNCS, vol. 9466, pp. 308-321. Springer, Cham (2015). 
https://doi.org/10.1007/978-3-319-26850-7.21 

Henzinger, T.A., Sezgin, A., Vafeiadis, V.: Aspect-oriented linearizability proofs. 
In: D'Argenio, P.R., Melgratti, H. (eds.) CONCUR 2013. LNCS, vol. 8052, pp. 
242-256. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40184- 
8.18 

Herlihy, M., Wing, J.M.: Linearizability: a correctness condition for concurrent 
objects. ACM Trans. Program. Lang. Syst. 12(3), 463-492 (1990). https://doi. 
org/10.1145/78969.78972 

Kawell Jr., L., Beckhardt, S., Halvorsen, T., Ozzie, R., Greif, I.: Replicated docu- 
ment management in a group communication system. In: Proceedings of the 1988 
ACM Conference on Computer-Supported Cooperative Work, p. 395. CSCW 1988. 
ACM, New York (1988). https://doi.org/10.1145/62266.1024798 

Khyzha, A., Gotsman, A., Parkinson, M.: A generic logic for proving linearizability. 
In: Fitzgerald, J., Heitmeyer, C., Gnesi, S., Philippou, A. (eds.) FM 2016. LNCS, 
vol. 9995, pp. 426-443. Springer, Cham (2016). https://doi.org/10.1007/978-3-319- 
48989-6_26 

Lamport, L.: Time, clocks, and the ordering of events in a distributed system. 
Commun. ACM 21(7), 558-565 (1978). https://doi.org/10.1145/359545.359563 
Liang, H., Feng, X.: Modular verification of linearizability with non-fixed lineariza- 
tion points. In: Boehm, H., Flanagan, C. (eds.) ACM SIGPLAN Conference on Pro- 
gramming Language Design and Implementation, PLDI 2013, 16-19 June 2013, 
Seattle, WA, USA, pp. 459-470. ACM (2013). https://doi.org/10.1145/2462156. 
2462189 

Liu, Y., Chen, W., Liu, Y.A., Sun, J.: Model checking linearizability via refinement. 
In: Cavalcanti, A., Dams, D.R. (eds.) FM 2009. LNCS, vol. 5850, pp. 321-337. 
Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-05089-3_21 
Lloyd, W., Freedman, M.J., Kaminsky, M., Andersen, D.G.: Don’t settle for even- 
tual: scalable causal consistency for wide-area storage with COPS. In: Wobber, 
T., Druschel, P. (eds.) Proceedings of the 23rd ACM Symposium on Operating 
Systems Principles 2011, SOSP 2011, 23-26 October 2011, Cascais, Portugal, pp. 
401—416. ACM (2011). https://doi.org/10.1145/2043556.2043593 

Lowe, G.: Testing for linearizability. Concurr. Comput.: Pract. Exp. 29(4) (2017). 
https://doi.org/10.1002/cpe.3928 


30. 


31. 


32. 


33. 


34. 


35. 


36. 


3T. 


38. 


39. 


Monitoring Weak Consistency 505 


O'Hearn, P.W., Rinetzky, N., Vechev, M.T., Yahav, E., Yorsh, G.: Verifying lin- 
earizability with hindsight. In: Richa, A.W., Guerraoui, R. (eds.) Proceedings of 
the 29th Annual ACM Symposium on Principles of Distributed Computing, PODC 
2010, 25-28 July 2010, Zurich, Switzerland, pp. 85-94. ACM (2010). https://doi. 
org/10.1145/1835698.1835722 

Schellhorn, G., Wehrheim, H., Derrick, J.: How to prove algorithms linearisable. 
In: Madhusudan, P., Seshia, S.A. (eds.) CAV 2012. LNCS, vol. 7358, pp. 243-259. 
Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-31424- 7.21 
Sergey, L, Nanevski, A., Banerjee, A.: Mechanized verification of fine-grained con- 
current programs. In: Grove, D., Blackburn, S. (eds.) Proceedings of the 36th 
ACM SIGPLAN Conference on Programming Language Design and Implementa- 
tion, 15-17 June 2015, Portland, OR, USA, pp. 77-87. ACM (2015). https://doi. 
org/10.1145/2737924.2737964 

Sergey, I., Nanevski, A., Banerjee, A.: Specifying and verifying concurrent algo- 
rithms with histories and subjectivity. In: Vitek, J. (ed.) ESOP 2015. LNCS, vol. 
9032, pp. 333-358. Springer, Heidelberg (2015). https://doi.org/10.1007/978-3- 
662-46669-8_14 

Shacham, O., Bronson, N.G., Aiken, A., Sagiv, M., Vechev, M.T., Yahav, E.: 
Testing atomicity of composed concurrent operations. In: Lopes, C.V., Fisher, K. 
(eds.) Proceedings of the 26th Annual ACM SIGPLAN Conference on Object- 
Oriented Programming, Systems, Languages, and Applications, OOPSLA 2011, 
part of SPLASH 2011, 22-27 October 2011, Portland, OR, USA, pp. 51-64. ACM 
(2011). https://doi.org/10.1145/2048066.2048073 

Terry, D.B., Demers, A.J., Petersen, K., Spreitzer, M.J., Theimer, M.M., Welch, 
B.B.: Session guarantees for weakly consistent replicated data. In: Proceedings of 
the Third International Conference on on Parallel and Distributed Information 
Systems, PDIS 1994, pp. 140-150. IEEE Computer Society Press, Los Alamitos 
(1994). http://dl.acm.org/citation.cfm?id=381992.383631 

Terry, D.B., Theimer, M., Petersen, K., Demers, A.J., Spreitzer, M., Hauser, C: 
Managing update conflicts in bayou, a weakly connected replicated storage system. 
In: Jones, M.B. (ed.) Proceedings of the Fifteenth ACM Symposium on Operating 
System Principles, SOSP 1995, 3-6 December 1995, Copper Mountain Resort, 
Colorado, USA, pp. 172-183. ACM (1995). https://doi-org/10.1145/224056.224070 
Vafeiadis, V.: Automatically proving linearizability. In: Touili, T., Cook, B., Jack- 
son, P. (eds.) CAV 2010. LNCS, vol. 6174, pp. 450-464. Springer, Heidelberg 
(2010). https: //doi.org/10.1007/978-3-642-14295-6_40 

Wing, J.M., Gong, C.: Testing and verifying concurrent objects. J. Parallel Distrib. 
Comput. 17(1-2), 164-182 (1993). https: //doi.org/10.1006/jpdc.1993.1015 
Zhang, S.J.: Scalable automatic linearizability checking. In: Taylor, R.N., Gall, 
H.C., Medvidovic, N. (eds.) Proceedings of the 33rd International Conference on 
Software Engineering, ICSE 2011, 21-28 May 2011, Waikiki, Honolulu, HI, USA, 
pp. 1185-1187. ACM (2011). https: //doi-org/10.1145/1985793.1986037 


506 M. Emmi and C. Enea 


Open Access This chapter is licensed under the terms of the Creative Commons 
Attribution 4.0 International License (http://creativecommons.org/licenses/by /4.0/), 
which permits use, sharing, adaptation, distribution and reproduction in any medium 
or format, as long as you give appropriate credit to the original author(s) and the 
source, provide a link to the Creative Commons license and indicate if changes were 
made. 

'The images or other third party material in this chapter are included in the chapter's 
Creative Commons license, unless indicated otherwise in a credit line to the material. If 
material is not included in the chapter's Creative Commons license and your intended 
use is not permitted by statutory regulation or exceeds the permitted use, you will 
need to obtain permission directly from the copyright holder. 


m 


Check for 
updates 


Monitoring CTMCs by Multi-clock 
Timed Automata 


Yijun Feng’, Joost-Pieter Katoen??(5, Haokun Li"), Bican Xia! (9), 
and Naijun Zhan?:4(9 (5 


! LMAM and School of Mathematical Sciences, Peking University, Beijing, China 
ker@protonmail.ch, xbc@math.pku.edu.cn 
? RWTH Aachen University, Aachen, Germany 
katoen@cs.rwth-aachen.de 
3 State Key Laboratory of Computer Science, Institute of Software, 
Chinese Academy of Sciences, Beijing, China 
znj@ios.ac.cn 
^ University of Chinese Academy of Sciences, Beijing, China 


Abstract. This paper presents a numerical algorithm to verify 
continuous-time Markov chains (CTMCs) against multi-clock determin- 
istic timed automata (D'TA). These DTA allow for specifying properties 
that cannot be expressed in CSL, the logic for CTMCs used by state- 
of-the-art probabilistic model checkers. The core problem is to compute 
the probability of timed runs by the CTMC C that are accepted by the 
DTA .A. These likelihoods equal reachability probabilities in an embed- 
ded piecewise deterministic Markov process (EPDP) obtained as product 
of C and A’s region automaton. This paper provides a numerical algo- 
rithm to efficiently solve the PDEs describing these reachability probabil- 
ities. The key insight is to solve an ordinary differential equation (ODE) 
that exploits the specific characteristics of the product EPDP. We pro- 
vide the numerical precision of our algorithm and present experimental 
results with a prototypical implementation. 


1 Introduction 


Continuous-time Markov chains (CTMCs) [17] are ubiquitous. They are used to 
model safety-critical systems like communicating networks and power manage- 
ment systems, are key to performance and dependability analysis, and naturally 
describe chemical reaction networks. The algorithmic verification of CTMCs 
has received quite some attention. Aziz et al. [3] proved that verifying CTMCs 
against CSL (Continuous Stochastic Logic) is decidable. CSL is a probabilistic 
and timed branching-time logic that allows for expressing properties like “is the 
probability of a given chemical reaction within 50 time units at least 107??". 
Baier et al. [5] gave efficient numerical algorithms for CSL model checking that 
nowadays provide the basis of CTMC model checking in PRISM [23], MRMC [22] 
and Storm [15], as well as GreatSPN [2]. Extensions of CSL to cascaded timed- 
until operators [27], conditional probabilities [19], and (simple) timed regular 
expressions [4] have been considered. 
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This paper considers the verification of CTMCs against linear-time real-time 
properties. These include relevant properties in the design of a gas burner [28], 
like “the probability that the duration of leaking is more than one twentieth 
over an interval with a length more than 20s is less than 10-9". Such real- 
time properties can be conveniently expressed by deterministic timed automata 
(DTA) [1]. The core problem in the verification of CTMC C against DTA A 
is to compute the probability of C's timed runs that are accepted by A, i.e. 
Pr(C E A). Chen et al. [10, 11] showed that this quantity equals the reachability 
probability in a piecewise deterministic Markov process (PDP) [14]. This PDP 
is obtained by taking the product of CTMC C and the region automaton of .A. 
Computing reachability probabilities in PDPs is a challenge. 

Practical implementations of verifying CTMCs against DTA specifications 
are rare. Barbot et al. [7| showed that for single-clock DTA, the PDP is in 
fact a Markov regenerative process. (This observation is also at the heart of 
model-checking CSL?“ [16].) This implies that for single-clock DTA, off-the- 
shelf CSL model-checking algorithms can be employed resulting in an efficient 
procedure [7]. Mikeev et al. [24] generalised these ideas to infinite-state CTMCs 
obtained from stoichiometric equations, whereas Chen et al. [12] showed the the- 
ory to generalize verifying single-clock D'TA to continuous-time Markov decision 
processes. 

Multi-clock DTA are however much harder to handle. The characterisation 
of PDP reachability probabilities as the unique solution of a set of partial dif- 
ferential equations (PDEs) [10,11] does not give insight into an efficient compu- 
tational procedure. With the notable exception of [25], verifying PDPs has not 
been considered. Fu [18] provided an algorithm to approximate the probabilities 
using finite difference methods and gave an error bound. This method hampers 
scalability and therefore was never implemented. The same holds for model- 
checking using other linear-time real-time formalisms such as MTL and timed 
automata [9], linear duration invariants [8], and probabilistic duration calculus 
[13]. All these multi-clock approaches suffer from scalability issues due to the 
low efficiency of solving PDEs and/or integral equations on which they heavily 
depend. 

This paper presents a numerical technique to approximate the reachability 
probability in the product PDP. The DTA A is approximated by DTA .A[t;] 
which extends .A with an additional clock that is never reset and that needs 
to be at most ty when accepting. By increasing the time-bound ty, DTA A[t,] 
approximates A arbitrarily closely. We show that the set of PDPs characterizing 
the reachability probability in the embedded PDP of C and .A[t;] can be reduced 
to solving an ordinary differential equation (ODE). The specific characteristics 
of the product EPDP, in particular the fact that all clocks run at the same pace, 
are key to obtain these ODEs. Our numerical algorithm to solve the ODEs is 
based on computing the approximations in a backward manner using ty and 
the sum of all clocks. The complexity of the resulting procedure is linear in the 
EPDP size, and exponential in Es where ó is the discretization step size. We 
show the approximations converges to the real solution of the ODEs at a linear 
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speed of 6. Using a prototypical tool implementation we present some results 
on a number of case studies such as robot navigation with varying number of 
clocks in their specification. The experimental results show promising results for 
checking CTMCs against multi-clock DTA. 


Organization of the Paper. Section2 introduces basic notions including 
CTMCs, DTA, and PDPs. Section 3 presents the product of a CTMC and the 
region graph of a DTA and shows this is an embedded PDP. Section 4 derives the 
PDE (fixing some flaw in [10]), the reduction to the set of ODEs and presents the 
numerical algorithm to solve these ODEs. Section 5 presents the experimental 
results and Sect. 6 concludes. 


2 Preliminaries 


In this section, we introduce some basic notions which will be used later. 

A probability space is denoted by a triple (2,7, Pr), where 2 is a set of 
samples, F is a o-algebra over 2, and Pr: F — [0,1] is a probability measure 
on F with Pr(Q) = 1. Let P,.(2) denote the set of all probability measures over 
Q. For a random variable X on the probability space, its expectation is denoted 
by E(X). 


2.1 Continuous-Time Markov Chain (CTMC) 
Definition 1 (CTMC). A CTMC is a tuple C = (S,P,o, AP, L, E), where 


— S is a finite set of states; 

-P:Sx S — [0,1] is the transition probability function, which is identified 
with the matriz P € [0,1]!5!*!5! such that Mies P(s,t) = 1, for all s € 8; 

- a €P,(S) is the initial distribution; 

- AP is a finite set of atomic propositions; 

- L: S — 2^P is a labeling function; and 

- E: S — Ryo is the exit rate function. 


We denote by s > s' a transition from state s to state s’ after residing in state 
s for t time units. The probability of the occurrence of this transition within t 
time units is P(s, s’) fo E(s) exp- F9)? da, where in E(s) exp- F ?? da stands for 
the probability to leave state s in ¢ time units, and P(s, s’) for the probability 
to select the transition to s' from all transitions outgoing from s. A state s is 
called absorbing if P(s, s) — 1. Given à CTMC C, removing the exit rate function 
E results in a discrete-time Markov chain (DMTC), which is called embedded 
DTMC of C. A CTMC C is called irreducible if there exists a unique stationary 
distribution a, such that a(s) > 0 for all s € S, and weakly irreducible if a(s) 
may be zero for some s € S. 


Definition 2 (CTMC Path). Let C be a CTMC, a path p of C starting form so 
B, EE sn € SX (Ryo x S)”. The 


to 
N 


with length n is a sequence p — so 81 
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set of paths in C with length n is denoted by Patl£ ; the set of all finite paths of C 


is Path, = Un Pathe and the set of infinite paths of C is Paths = —(SxRzo)". 


We use Path? = PathSn U Path, to denote all paths in C. As a convention, € 
stands for the empty path. 


Note that we assume the time to exit a state is strictly greater than 0. For 
an infinite path o, we use Pref(p) to denote the set of its finite prefixes. For a 


(finite or infinite) path p with prefix so 29, 81 E ..., the trace of the path is 
the sequence of states trace(p) = sosi1.... Let p(n) = s, be the n-th state in 
the path and p[n] = tn be the corresponding exit time for sn. For a finite path 


p = Sp 20 51 = Bn Sn, we use T(p) = S t; to denote the total time 


spent on this path E n > 1, otherwise T(p) = 0. For a time t € T(p), p(0...t) 
denotes the prefix of p within t time units, i.e., so E oa, cer Sm if there 
exists some m < n with *57 5. p[m] X t^ y o Plm] > t, otherwise e. 

A basic cylinder set C(so, Io,--- , I4 1,4) consists of all paths p € Path? 
such that p(i) = s; for 0 < i < n, and p[i] € I; for 0 € i < n. Then the 
c—algebra F,,(C) associated with CTMC C and initial state sọ is the smallest 
c — algebra that contains all cylinder sets C(so, Io,--+ , 15 1,55) with a(so) > 0, 
and P(s;, s;,1) > 0, for 1 < i € n, and Io,..., 41 are non-empty intervals in 
R>o. There is a unique probability measure Pi on the c—algebra F,,(C), by 
which the probability for a cylinder set is given by 


Prf (C(so, Io, +> ,I4,54)) = a(so)* IL / E(sj 1) exp” 2-1)" dz - P(s;_1, 5;) 


Example 1. An example of CTMC is shown in Fig. 1, with AP = {a,b,c} and 
initial state so. The exit rate r;, i = 0,1,2,3 and transition probability are 
shown in the figure. 


(5) r2 


nh 

2 
—— 
- 
Aa 

a 
—— 
c 
N 

- 


Fig. 1. An example of CTMC 


2.2 Deterministic Timed Automaton (DTA) 


A timed automaton is a finite state graph equipped with a finite set of non- 
negative real-valued clock variables, or clocks for short. Clocks can only be 
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reset to zero, or proceed with rate 1 as time progresses independently. Let 
A = {x1,...,%n} be a set of clocks. n(x) : X — Rzo is a X-valuation which 
records the amount of time since its last reset. Let Val( A) be the set of all clock 
valuations of A. For a subset X C X, the reset of X, denoted as 7[|X := 0], is 
the valuation 7’ such that n(x) = 0, Vx € X, and ņ'(x) = n(x), otherwise. For 
d € Ryo, (n+ d)(z) = n(x) + d for any clock x € X. 

A clock constraint over A is a formula with the following form 


gi=a<cla<cluau>cla>cla-y>clgg, 


where x, y are clocks, c € N. Let Con(¥) denote the set of clock constraints over 
X. A valuation 7 satisfies a guard g, denoted as 7 = g, iff n(x) ra c when g is 
xz c, where XKE (€, €, >, >}; and y H gi and y = go iff g = g1 ^ ga. 


Definition 3 (DTA). A DTA is a tuple A = (X, X, Q, qo, Qr, —), where 


— X is a finite set of actions; 

- X is a finite set of clocks; 

- Q is a finite set of locations; 

- qo € Q is the initial location; 

- Qr CQ is the set of accepting locations; 

- € (Q\Qr) x X x Con(X) x 2* x Q is the transition relation, satisfying if 
eX , ag’ Xx! oap / H" / 

q —— q' and q —— q" with q' Z q” then gN g' = Ù. 

Each transition relation, or edge, q — q’ in A is endowed with (a, g, X), 
where a € X is an action, g € Con(%) is the guard of the transition, and X C X 
is a set of clocks, which should be reset to 0 after the transition. An intuitive 
interpretation of the transition is that A can move from q to q’ by taking action 
a and resetting all clocks in X to be 0 only if g is satisfied. There are no outgoing 


transitions from any accepting location in Qr. 
ao,to ay,ty Qn—1;tn—1 


A finite timed path of A is of the form 6 = qo $ qi Qn; 

where £; > 0, for i = 0,...,n— 1. Moreover, there exists a sequence sor transitions 
X 

dj QE, qj41, for 0 < j < n — 1, such that no = 0, nj; +t; H| gj and 


"jai = nj[ X; := 0], where ng denotes the clock valuation when entering qi. 0 is 
said to be accepted by A if there exists a state q; € Qr for some 0 € i < n. As 
normal, it is assumed all DTA are non-Zeno [6], that is any circular transition 
sequence takes nonzero dwelling time. 

A region is a set of valuations, usually represented by a set of clock con- 
straints. Let Reg(X) be the set of regions over X. Given O,O' € Reg(X), O is 
called a successor of O if for all n = O, there exists t > 0 such that y +t H ©’ 
and Vt! <t,n+t’ H OVO’. A region O satisfies a guard g, denoted as O H g, 
iff Yn E O implies n — g. The reset operation on a region O is defined as 
O[X := 0| = (n[X :2 0] | n H| O}. Then the region graph, viewed as a quotient 
transition system related to clock equivalence [6] can be defined as follows: 


Definition 4 (Region Graph). The region graph for DTA A = (2,4,Q, 
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- Q =Q x Reg(X) is the set of states; 
- qo = (qo,0) € Q is the initial state; 
- Qr C Qr x Reg(X) is the set of final states; 


- >C Q x ((Z x 2*)U(A)) x Q is the transition relation satisfying 
) As (q, €') if O' is a successor of O; 
X 


e (4,0 
e (0,0) [Em (q', O") if there exists g € Con(X) and transition q 2IL, q 
such that O = g and O” = O[X := 0]. 


Example 2 (Adapted from [10] ). Figure 2 presents an example of DTA and Fig. 3 
gives its region graph, in which double circle and double rectangle stand for final 
states, respectively. 


{a}, 0 
Q ET. 


start —>| qo, 0 < x «1[L——|qo,0€ z«1 


{a}, x < 1,0 a m 
b, 0 


qo, 1 < x < 2 iq, 1 <xr<2 


(bz > 1,0 = 
start —>| qo >| | 
A 


=~ 


b,0 
{a}, 1 < x < 2, {x} qo, z > 2 cq, xr > 2 
Fig. 2. A DTA A Fig. 3. The region graph of A 


2.3 Piecewise-Deterministic Markov Process (PDP) 


Piecewise-deterministic Markov Processes (PDPs for short) [14] cover a wide 
range of stochastic models in which the randomness appears as discrete events 
at fixed or random times, whose evolution is deterministically governed by an 
ODE system between these times. A PDP consists of a mixture of deterministic 
motion and random jumps between a finite set of locations. During staying in 
a location, a PDP evolves deterministically following a flow function, which is a 
solution to an ODE system. A PDP can jump between locations either randomly, 
in which case the residence time of a location is governed by an exponential 
distribution, or when the location invariant is violated. The successor state of 
the jump follows a probability measure depending on the current state. A PDP 
is right-continuous and has the strong Markov property [14]. 


Definition 5 (PDP [14]). A PDP is a tuple Q = (Z, X, Inv, ġ, A, p) with 


- Z is a finite set of locations; 
- X is a finite set of variables; 
- Inv: Z — 2E" is an invariant function; 


Monitoring CTMCs by Multi-clock Timed Automata 513 


- ġ: Z xRI¥! x R>o > R'*1, is a flow function, which is a solution of a system 
of ODEs with Lipschitz continuous vector fields; 

- A:S —5 Ryo is an exit rate function; 

- S — P,(S), is the transition probability function, where S = {€ := (z,n) | 
z € Z,n P Inv(z)} is the state space for Q, S is the closure of S, S? = 
[(z,9)|z€Z,n E Inwz)?) is the interior of S, in which Inv(z)° stands for 
the interior of Inv(z), and OS = Uzez{z} x OInv(z) is the boundary of S, in 
which OInwz) = Inv(z)\ Inu? and Inv(z) is the closure of Inv(z). 


For any € = (z,n) € S, there is an ó(£) > 0 such that A(z, (z,n, t)) is integrable 
on [0, ó(£)). L(£)(.A) is measurable for any A € (S), where F(S) is the smallest 
c—algebra generated by {U ez z x Az|Az € F(Inv(z))} and u(£)((£5) = 0. 

'There are two ways to take transitions between locations in PDP Q. A PDP 
Q is allowed to stay in a current location z only if Inv(z) is satisfied. During 
its residence, the valuation ņ evolves time-dependently according to the flow 
function. Let £ 6 t = (z,¢(z,n,t)) be the successor state of € = (z,n) after 
residing t time units in z. Thus, Q is piecewise-deterministic since its behavior 
is determined by the flow function ¢ in each location. In a state € = (z,n) with 
n H Im(z)?, the PDP Q can either evolve to a state £' = €@t by delaying t time 
units, or take a Markovian jump to €” = (z", n”) € S with probability p(€)({é}). 
When 7 = OInwz), Q is forced to take a boundary jump to €” = (2”, n”) ES 
with probability u(£)((£")). 


3 Reduction to the Reachability Probability of EPDP 


As proved in [10], model-checking of a given CTMC C against a linear real-time 
property expressed by a DTA A, i.e., determining Pr(C E A), can be reduced 
to computing the reachability probability of the product of C and G(.A). This can 
be further reduced to computing the reachability probability of the embedded 
PDP (EPDP) of the product. But how to efficiently compute the reachability 
probability of the EPDP still remains challenging, as existing approaches [7, 10, 
16] can only handle DTA with one clock. We will attack this challenge in this 
paper. For self-containedness, we reformulate the reduction reported in [10] in 
this section. 

A path p = so 2 sı “+ ... of CTMC C is accepted by DTA A if 5 = 


L o L st L(sn—1);tn— . : 
qo PDT qı E : Pod qn induced by some p’s prefix is an 


accepting path of A. Then Pr(C |= A) = Pr{p € Path® | p is accepted by A}. 


Definition 6 (Product Region Graph [7]). The product of CTMC C — 
(S, P, o, AP, L, E) and the region graph of DTA G(A) = (X,4,Q,q5, Qr, ), 
denoted by C & G(.A), is a tuple (X, V,o/, Vp, —, A), where 


- V = S x Q is the state space; 

- a'(s,qo) = a(s) is the initial distribution; 

- Vp = S x Qr is the set of accepting states; 

- CV x (([0,1] x 22) U (A]) x V is the smallest relation satisfying 
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e (s,q) A (s, q') (called delay transition), if q eg; 


e (5,q) p (s", q") (called Markovian transition), if P(s, s”) = p,p > 0 
— L(s),X TT. 
and q ——-— q"; 


- A: V — Ryo is the exit rate function, where 


Magi E(s) if there exists a Markovian transition from (s,q) 
549/7310 otherwise 


Remark 1. Note that the definition of region graph here is slightly different from 
the usual one in the sense that Markovian transitions starting from a boundary 
do not contribute to the reachability probability. Therefore we can merge the 
boundary into its unique delay successor. 


Example 3 (Adapted from |10]). Figure4 shows the product region graph of 
CTMC C in Example1 and DTA A in Example2. The graph can be split into 
three subgraphs in a column-wise manner, where all transitions within a sub- 
graph are probabilistic, all transitions evolve to the next subgraph are delay 
transitions, and transitions with reset lead to a state in the first subgraph. For 
conciseness, the location vg stands for all nodes that may be reached by a Marko- 
vian transition yet cannot reach an accepting node. 


Proposition 1 ([10]). For CTMC C and DTA A, Pr(C = A) is measurable 


and 


Pr(C E A) = PESSA ( Path?“ (5 5). 


Uo, ro U1, To 


À 
start —>] S0, qo, 0 < z < 1L————[so,qo, 1< 2 «2 


1 0.5 reset, 1 
U2, Tl set, 0.5) v3, To 
AÀ 


s1, qo, 0 < x < 1 c 51,90, 1 Em «2 


reset, 0, 
0.2 
Ua,T2 y À V5, T2 A V6, T2 
0.3 $2,Ggo, 0X xz «1 82,Qo, l < x < 2 |— — —— —|32,Q00, 1T > 2 
1 feset, 0.3 1 1 
Uo,T3 | U7, T2 A Us,T2 
53,Q0,2 > 0 s2,q1, 1] < z < 2182,31, 2 2 2 


Fig. 4. Product region graph C & G(A) of CTMC C in Examplel and DTA A in 
Example 2 


When treated as a stochastic process, C & G(.A) can be interpreted as a PDP. 
In this way, computing the reachability probability of Qr in C & G(.A) can be 
reduced to computing the time-unbounded reachability probability in the EPDP 
of C & G(A). 
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Definition 7 (EPDP, [7]). Given C & G(.A) = (X, V,a’, Vr, 5, A), the EPDP 
QC9^ is a tuple (X, V, Inv, ¢, A, u) where for any v = (s,(q,0) € V 


- Im(v) = O, S= ((v,m) | ve Von E Im(v)) is the state space; 

- ó(v,m,t) =n +t for m H Inv(v); 

— A(v, n) = A(v) is the exit rate of (v, n); 

- Boundary jump: for each delay transition v Avy! inC @O(A), ple, {E} =1 
whenever € = (v,n), & =(v',n) and n E OInv(v); 


: Pa " i dee pix ; 
- Markovian transition jump: for each Markovian transition v —— v" in C & 


oA), w(€,{€"}) = p whenever € = (v,n), n E Inv(v) and €" = (v",m|X :— 
0|). 


'The flow function here describes that all clocks increase with a uniform rate 
(Le. # = 1,...,2, = 1, or simply Æ = 1) at all locations. The original 
reachability problem is then reduced to the reachability probability of the set 
{(v,n) | v € Vr,m H Inv(v)}, given the initial state (v9, 0) and the EPDP 
QC94. Let Pre" (n) stand for the probability to reach the final states (Vr x *) 
from (v, n) in Q°®4. Thus, p,9 7^ (7) can be computed recursively by 


QBA 


CQA : 
"m Pay QUE ds Pro, (n) ifv é Ve 
Pr; (n) = 1, vE Vp ^ T) [— Inv(v) (1) 
0, otherwise. 


Let t%(v,7) denote the minimal time for Q°®4 to reach OInv(v) from (v, n). 
More precisely, 


tz(v,m) = inf{t | d(v, n, t) H Inv(v)}. 


pe^ (n) is the probability from (v, n) with a delay and then a forced jump to 
(v', m 4- tz (v, ])), onwards evolves to an accepting state, which can be recursively 
computed by 


CQA * CQA P 
Prey (m) = ezp(- A(v)tz(v,m)) -Prè ^ (n tv, m). 
pO (7) is the probability that a Markovian transition v EE yl happens 
within tž(v, n) time units, onwards involves to an accepted state, which can be 
recursively computed by 


t2(v.n) 
Prove” (n) = J p: A(v) exp(—A(v)s) - pig (m + s[X :— 0]) ds. 
0 


Pr(C |= A) is reduced to compute proves (0), equivalent to computing the least 
fixed point of the Eq. (1). That is, 


Theorem 1. [10] For CTMC C and DTA A, P(C | <A) = Pr&®A 
{Path 8^ (SQr)} is the least fixed point of (1). 
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Remark 2. Generally, it is difficult to solve a recursive equation like (1). As 
an alternative, we discuss the augmented EPDP of Q°®4 by replacing A with a 
bounded DTA resulting from A. As a consequence, using the extended generator 
of the augmented EPDP, we can induce a partial differential equation (PDE) 
whose solution is the reachability probability. We will elaborate the idea in the 
subsequent section. 


4 Approximating the Reachability Probability of EPDP 


In this section, we present a numerical method to approximate proe* (0), as we 
discussed previously that exactly computing is impossible, at least too expensive, 
in general. We will first introduce the basic idea of our approach in detail, then 
discuss its time complexity and convergence property. À key point is that our 
approach exploits the observation that the flow function of QC9^ is linear, only 
related to time £, and remains the same at all locations. This enables to reduce 
computing p,Q (0) to solving an ODE system. 


4.1 Reduction to a PDE System 


In this subsection, we first show that pg (0) can be approximated by that 
of the EPDP of C and a bounded DTA derived from A, i.e., the length of all its 
paths is bounded. Then show that the latter can be reduced to solving a PDE 
system. 

Given a DTA A, we construct a bounded DTA .A[t;] by introducing a new 
clock y, adding a timing constraint y < ty to the guard of each transition of A 
ingoing to an accepting state in Qr, and never resetting y, where t; € N is a 
parameter. So, the length of all accepting paths of .A[t ;] is time-bounded by tj. 
Obviously, Path (.A[t;]) is a subset of Path^(.A). As Pr(C H A) is measurable 
and QC9^ is Borel right continuous, we have the following proposition. 


Proposition 2. Given a CTMC C, a DTA A, and t; € N, 


lim Pr(C H Afta) = P(C H A). (2) 


tf—oo 


Moreover, if C is weakly irreducible or satisfies some conditions (please refer to 
Chap. 4 of [26] for details), then there exist positive constants K, Ko € R>o such 
that 

Pr(C E A) - Pr(C E- A[tj]) € K exp{—Kots}. (3) 


Remark 3. (2) was first observed in [7], thereof the authors pointed out the 
feasibility of using a bounded system to approximate the original unbounded 
system in order to simplify a verification obligation. (3) further indicates that 
such approximation is exponentially convergent w.r.t. —tr if the CTMC is weakly 
irreducible. 
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For a path starting in a state (v, 1]) at time y, we use Pathy n yl ] to denote 
the set of its locations at time t, and A,(y,m) = Pr(Path?, n yts] € Ver) = 


(1 Path¥, [t ;]ev,.) as the probability of a path reaching Vp within t; time units, 
v.n 
where Lpath, [t;]e Vy. ÍS the indicator function of Path/, a) Ef] € Vr. Then, 


hy, (0,0) = PAC LE H| Alt ;]) is the probability to reach the set of accepting states 
from the initial state (0,0), which satisfies the following system of PDEs. 


Theorem 2. Given a CTMC C, a bounded DTA Alt;], and the EPDP 
Q°2@9(Alts]) = (x V, Inv, ¢, A, p), fi, (0,0) is the unique solution of the following 
system of PDEs: 


m. USD -eAG) E psum = 0-0) = 0. () 


t= p X 
U——— v! 


where v € V\Vr,n H Inv(v),n© is the i-th clock variable and y € [0,tf). The 
boundary conditions are: 


(i) u(y, m) = hy (y, n), for every n E OInv(v) and transition v 3, vt; 
(ii) h,(y,m) = 1, for every vertex v € Ve, n E- Im(v), and y € (0, tf); 
(iit) h,(tg,m) = 0, for every vertex v € VVAVe. and n = Inv(v) U OInwv). 


Remark 4. Note that the PDE system (4) in Theorem 2 is different from the one 


presented in [10] for reducing Pr" (0). In particular, the boundary condition 
n [10] has been corrected here. 


4.2 Reduction to an ODE System 


There are several classical methods to solve PDEs. Finite element method, which 
is a numerical technique for solving PDEs as well as integral equations, is a 
prominent one, of which different versions have been established to solve different 
PDEs with specific properties. Other numerical methods include finite difference 
method and finite volume method and so on, the reader is referred to [20,21] 
for details. Thanks to the special form of the Eq. (4), we are able to obtain a 
numerical solution in a more efficient way. 

The fact that the flow function (which is the solution to the ODE 
systemA, cy $ — 1A y= 1) is the same at all locations of the EPDP QCe4^tt;] 
suggests that the partial derivatives of 7 and y in the left side of (4) evolve with 
the same pace. Thus, we can view all clocks as an array, and reformulate (4) as 


Oh,(y,m) Oh.(y,m) Ohy(y, d : 


Oy ^" MO O AD 
AQ): 5 pwl nX = 0) - h(y m) =0, (5) 


p.X 
u—' 
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where e stands for the inner product of two vectors of the same dimension, e.g., 
n times 


mm 
(a1,...,@n) © (b1,...,bn) = 35, aibi, and 1 for the vector (1,...,1). 

By Theorem 2, there exist vo, yo and no such that vo € Vr, yo = ty, and 
No  Inv(v) V OInv(v). Besides, by the definition of Q°°4l/l, it follows = 1, 
which implies dz = dt, for any z € {y} U X. Hence, we can simplify (5) as the 
following ODE system: 


dh, ((yo; Mo) + t) 
dt 


+ A(v)- 


$5 n Du (Go, M0) + OIX = 0]) — (yo, M0) = 0, (6) 


p.X 
vu— v! 


with the initial condition vo € Vr, yo = tf, and Ny = Inv(v) v OInw(v), where 
v € V\Vr. Note that we compute the reachability probability by (6) backwards. 


4.3 Numerical Solution 


Since fi;((yo, No) + t) satisfies an ODE equation, we can apply a discretization 
method to (6) and obtain an approximation efficiently. To this end, the remaining 
obstacle is how to deal with the reset part hy (yo + t, (jg + t)[X :— 0]). Notice 
that X # Ø => sum((ng-4- £)[X := 0])+ (£; — yo — t)) < sum(mg t) + (£; — to — t), 
where sum(7) = » 5, c y n(x). So we just need to solve the ODE system starting 
from (tf, No) using the descending order over sum(7) in a backward manner. 
In this way, all of the reset values needed for the current iteration have been 
computed in the previous iterations. T'herefore for each iteration, the derivation 
is fixed and easy to calculate. 

We denote by ó the length of discretization step, the number of total dis- 
cretization steps is [4] € N. An approximate solution to (4) can be computed 
efficiently by the following algorithm. 

Line 4 in Algorithm 1 computes a numerical solution to (6) on [tf — t,tp] 
by discretizing S*«e219. with 1(h, (yo, mo) + (t + 8)) — ho((yos no) + 2). 
A pictorial illustration to Algorithm 1 for the two-dimensional setting is shown 
in Fig. 5. The blue polyhedron covers all the points we need to calculate. The 
algorithm starts from (0,0,£5), where sum(7) = xı + £2 = 0. Then sum(7) is 
incremented until 2¢7 in a stepwise manner. For each fixed sum(7), for exam- 
ple sum(7) = ty, the algorithm calculates all discrete points in the gray plane 
following the direction (—1, —1, —1), and finally reaches the two reset lines. The 
red line reaching the origin provides the final result. 
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Algorithm 1. Finding numerical solution to (4) 

Input: C & G(A), the region graph of the product of CTMC C and DTA A; ty, the 
time bound 

Output: A numerical solution for h,, (0, 0), an approximation of Pr(C = Alty]) 


1: for n — 0 to |X|- t; by 6 do 

2: for each 7 in (7 | sum(y') =n AVi € (1,...,|X|N 0 < n® < ty} do 

3: for t from 0 down to — min(ts,7) do 

4: Compute numerical solution to (6) with (yo, no) = (tr, n) on [tg — t, t] 
5: end for 

6: | end for 

7: end for 

8: return numerical solution for hy, (0, 0) 


The direction to the reset point 
sum(7) = 2 -tf - E sum(n) = 2- tf 


m 0 +” t decreases from 0 


T2 T2 


Fig. 5. Illustrating Algorithm 1 (left) and Algorithm 2 (right) for the 2-dimensional 
setting (Color figure online) 


Example 4. Consider the product C & G(A) shown in Example3 (in page 8). For 
state v3 in which clock x is 1 and y is arbitrary, the corresponding PDE is 


Ohvs(y, 1) , Ahvs(y, 1) 
Oy Ox 


H rol0.5-Avo (y, 0) + 0.2-Rog (y, 0) + 0.4-Rug (y, 0) — Rug (y, 0)] = 0. 


Since sum(y,0) = y < y+ 1 = sum(y,1), the value for h,,(y,0), Av, (y, 0) 
and fiy,(y,0) have been calculated in the previous iterations, thus the value for 
hy, (y, 1) can be computed. 


To optimize Algorithm 1 for multi-clock objects, we exploit the idea of 
“lazy computation". In Algorithm 1, in order to determine the reset part for 
(6), we calculate all discretized points generated by all ODEs. The efficiency 
is influenced since the amount of ODEs is quite large (the same as the num- 
ber of states in product automaton). However in Algorithm 2, we only compute 
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the reset part that we need for computing f,,(0,0). If we meet a reset part 
hi (y, |X := 0]) which has not been decided yet, we suspend the equation we 
are computing now and switch to compute the equation leading to the unde- 
cided point following the direction of (—1,...,—1). The algorithm terminates 
since the number of points it computes is no more than that of Algorithm 1. A 
pseudo-code is described in Algorithm 2. 


Algorithm 2. The lazy computation to find numerical solution to (4) 

Input: C ®G(A), the region graph of the product of CTMC C and DTA A; t y, the time bound 
Output: A numerical solution for hy, (0, 0), an approximation of Pr(C = .A[t;]) 

Procedure dhv(y, 1) //Computing numerical solution for (y, n) 


1: for t from 0 down to — min(ty, n) by ô do 
for v € V do 


Check if 7 satisfies initial and boundary condition from Theorem 2 


, ES pa 
for each Markovian transition v —— v/ do 


3 
4: 
5: up = (—t — ô) - 1 + ((t + ô) - 1)[X := 0] 
6: 
7 
8 


if reset exists and n[X := 0] + up is undecided then 
call dhv(t¢,[X :— 0] + up) 


: end if 
9: comput hy 
10: end for 
11: end for 
12: | execute A—transition according to Theorem 2 
13: compute /i;((yo, jo) + t) by equation (6) 
14: end for 


15: mark 7 decided 
End Procedure 


1: Call dhv(vo, ty, (tf )) 
2: return numerical solution for hy, (0, 0) 


4.4 Complexity Analysis 


Let |S| be the number of the states of the CTMC, and n the number of the 
clocks of the DTA. The worst-case time complexity of Algorithms1 and 2 lies 
in O(|V|- air), where |V| is the number of the equations in (4), i.e., the 
number of the locations in the product region graph, that are not accepting. 
The number of states in the region graph of the DTA is bounded by n! -2^-1 . 
I, ex (cs + 1), denoted by Cy, where c, is the maximum constant occurring in 
the guards that constrain zr. Note that C, differs from the bound given in [1], 
since the boundaries of a region do not matter in our setting and hence can be 
merged into the region. Thus, the number of states in the product region graph, 
as well as the number of PDE equations in Theorem 2, is at most C,- |S|. So the 
total complexity is O(Cy - |S| - [2 ]**). 

Let Av n(Yo, No) denote the numerical solution to ODE (6) with t = —nó, 
and Amar = max(A(w) | 0 < i < |S|}. Let N = [4]. By Proposition 2, 


] lim h,(0,0) = Pr(C = A) and (0,0) is monotonically increasing for ty. In 
fot oo 
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the following proposition, for simplicity of discussion, we assume tp equal to Nó. 
'Then, the error caused by discretization can be estimated as follows: 
Proposition 3. For N € N+ and ô = M 

lf (56g 1) — ha, (0,0)] = O(8) 


For function f(6), f is of the magnitude O(ó) if lim 22 = C, where C 
is a constant. From Proposition 3, if we view Amax and ty as constants, then 
the error is O(d) to the step length 6. By Proposition 2, the numerical solution 
generated by Algorithm 1 converges to the reachability probability of C & A, and 
the error can be as small as we expect if we decrease the size of discretization ô, 
and increase the time bound tp. 


5 Experimental Results 


We implemented a prototype including Algorithms1 and 2 in C and a tool 
taking a CTMC C and a DTA A as input and generating a .c file to store their 
product in Python, which is used as an input to Algorithms 1 and 2. The first 
two examples (Examples 5 and 6) come from [10] to show the feasibility of our 
tool. The last case study is an example of robot navigation from [7]. In order to 
demonstrate the scalability of our approach, we revise the example with different 
real-time requirements, which require DTA with different number of clocks. The 
examples are executed in Linux 16.04 LTS with Intel(R) Core(TM) i7-4710HQ 
2.50 GHz CPU and 16 G RAM. The column “time” reports the running time 
for Algorithm 1, and “time (lazy)” reports the running time for Algorithm 2. All 
time is counted in seconds. 


Example 5. Consider Example 3 with r; = 1, i = 0,...3 and 6 = 0.01, experi- 
mental result is shown in Table 1. The relevant error when ts = 30 and t; = 40 
is 5 x 1077. 


Table 1. The experimental results for Examples 5 and 6 


t; | Example 5 Example 6 
Rup (0,0) | time | time (lazy) | hy, (0,0) | time | time (lazy) 
20 | 0.110791 | 0.8070 | 0.7232 0.999999 | 0.1685 | 0.0002 
30 | 0.110792 | 1.7246 | 1.6260 0.999999 | 0.3453 | 0.0003 
40 | 0.110792 | 3.0344 | 2.8760 0.999999 | 0.6265 | 0.0003 


Example 6. Consider the reachability probability for the product of a CTMC 
and a DTA as shown in Fig. 6. A part of its region graph is shown in Fig. 7. Set 
ro = rı = 1, 6 = 0.1, the experimental result is given in Table 1. The relevant 
error when ts = 30 and ty = 40 is 1 x 1077. Note that even for this simple 
example, none of existing tools can handle it. 
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Uo, To v1, Tro v2, ro 
A A 

lo, lo, lo, 

To x2 >1,{ai},1 TL O<am=22<1 1€z1-2az2«2 gı > 2,22 > 2 
wro [Le "EPTEI 

start — lo, lo, 

O0<m <1, 0<m «1 
l<22<2, 12 > 2, 

zi < 2, {x2},1 222 xı +1 x2 > xı +2 


Fig. 6. The product automaton of Fig.7. The reachable product region graph of 
Example 6 Fig. 6. 


Example 7. Consider a robot moves on a N x N grid as shown in Fig. 8 (adapted 
from [7]). It can move up, down, left and right. For each possible direction, the 
robot moves with the same probability. The cells are grouped with A, B, C and 
D. We consider the following real-time constraints: 


Pı: The robot is allowed to stay in adjacent C-cells for at most T time units, 
and D-cells for at most T> time units; 

P3: The total time of the robot continuously resides in adjacent C-cell and D-cell 
is no more than 73 time units, with T, < T3 and T5 € 73; 

P3: The total time of the robot continuously resides in adjacent A-cell and C-cell 
is no more than 74 time units, with Ti < T4. 


In this example, we are verifying whether the CTMC satisfies (i) Pi; (ii) Pi A Ps; 
(iii) Pj A P» ^ P3. Obviously, P, can be expressed by a DTA with one clock, see 
Fig. 9; to express P, ^ P5, a DTA with two clocks is necessary, see Fig. 10; to 
express P; ^ P; ^ P3, A DTA with three clocks is necessary, see Fig. 11. 


D,x € Ty (x) 


D,x « T2,0 


Fig. 8. An example grid Fig. 9. A DTA with one clock for P; 


The experimental results are summarized in Table2. The relevant error of 
tf = 20 and t; = 21 is smaller than 10 ?. As can be seen, the running time 
of our approach heavily depends on the number of clocks. Compared with the 
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C,z«Ty0 C,x«T,0 


Bix < Tu < T4,0 
Dix < Ti, {x} 


D, 2 < Ta, (x, y 


D, £ <T2,0 D,z <T2,0 


Fig. 10. A DTA with two clocks for Fig. 11. A DTA with three clocks for 
P A P2 P, A P A P3 


Table 2. Experimental results for the robot example with 6 = 0.1, running time longer 
than 2700s is denoted by ‘TO’ (timeout), the column “#(P)” counts the number of 
states in the product automaton C @G(A), *time([7])" is the running time of prototype 
in [7] when precision = 0.01, Ti = 75 = 3, T3 = 5, T4 = T 


One clock 'Two clocks Three clocks 
N |ts | #(P) time time (lazy)|time((7])|#(P) |time |time (lazy) #(P) time |time (lazy) 
4 |10 39 0.027) 0.027 0.011 139 |2.583 1.746 |733 525.7|141.4 
15 0.049 0.043 7.117 3.445 TO |257.35 
20 0.070. 0.071 12.88 5.49 TO 1583.76 
10/10/232 | 0.167) 0.164 0.087 968 |39.41 25.92 5134 TO |1039.7 
15 0.278) 0.278 108.48) 53.28 TO |TO 
20 0.417) 0.421 226.56) 89.50 TO |TO 
20/10 940 | 1.142) 0.909 1.23 4000 250.1 | 180.7 TO |TO 
15 1.65 | 1.54 672.8 | 375.6 TO |TO 
20 2.54 | 2.41 1326.8, 616.1 TO |TO 
30/10 2125 | 2.38 | 2.45 6.84 9120 |812.9 | 380.5 TO |TO 
15 4.45 | 5.42 2058.1} 770.8 TO |TO 
20 7.45 | 7.28 TO 12834 TO |TO 
40|10 3820 | 5.62 | 6.52 20.31 16395 |1484.3| 759.8 TO |TO 
15 11.97 |11.02 TO 1619.9 TO |TO 
20 15.26 16.17 TO 2661.3 TO |TO 


results reported in [7] for the case of one clock in this case study (when the 
precision is set to be 1072), our result is as fast as theirs, but their tool cannot 
handle the cases of multiple clocks. In contrast, our approach can handle DTA 
with multiple clocks as indicated in the verification of P) and P3. Algorithm 2 
is much more faster than Algorithm 1 when the number of clocks grows up. To 
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the best of our knowledge, this is the first prototypical tool verifying CTMCs 
against multi-clock DTA. 


6 Concluding Remarks 


In this paper, we present a practical approach to verify CTMCs against DTA 
objectives. First, the desired probability can be reduced to the reachability prob- 
ability of the product region graph in the form of PDPs. Then we use the aug- 
mented PDP to approximate the reachability probability, in which the reachabil- 
ity probability coincides with the solution to a PDE system at the starting point. 
We further propose a numerical solution to the PDE system by reduction it to 
a ODE system. The experimental results indicate the efficiency and scalability 
compared with existing work, as it can handle DTA with multiple clocks. 

As a future work, it deserves to investigate whether our approach also works 
in the verification of CTMCs against more complicated real-time properties, 
either expressed by timed automata and MTL as considered in [9], or by linear 
duration invariants as considered in [8]. 
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Abstract. Partial order reduction for timed systems is a challenging 
topic due to the dependencies among events induced by time acting 
as a global synchronization mechanism. So far, there has only been 
a limited success in finding practically applicable solutions yielding 
significant state space reductions. We suggest a working and efficient 
method to facilitate stubborn set reduction for timed systems with urgent 
behaviour. We first describe the framework in the general setting of timed 
labelled transition systems and then instantiate it to the case of timed-arc 
Petri nets. The basic idea is that we can employ classical untimed partial 
order reduction techniques as long as urgent behaviour is enforced. Our 
solution is implemented in the model checker TAPAAL and the feature 
is now broadly available to the users of the tool. By a series of larger case 
studies, we document the benefits of our method and its applicability to 
real-world scenarios. 


1 Introduction 


Partial order reduction techniques for untimed systems, introduced by Gode- 
froid, Peled, and Valmari in the nineties (see e.g. [6]), have since long proved 
successful in combating the notorious state space explosion problem. For timed 
systems, the success of partial order reduction has been significantly challenged 
by the strong dependencies between events caused by time as a global synchro- 
nizer. Only recently—and moreover in combination with approximate abstrac- 
tion techniques—stubborn set techniques have demonstrated a true reduction 
potential for systems modelled by timed automata [23]. 

We pursue an orthogonal solution to the current partial order approaches 
for timed systems and, based on a stubborn set reduction [28,39], we target a 
general class of timed systems with urgent behaviour. In a modular modelling 
approach for timed systems, urgency is needed to realistically model behaviour in 
a component that should be unobservable to other components [36]. Examples 
of such instantaneously evolving behaviours include, among others, cases like 
behaviour detection in a part of a sensor (whose duration is assumed to be 
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negligible) or handling of release and completion of periodic tasks in a real-time 
operating system. We observe that focusing on the urgent part of the behaviour 
of a timed system allows us to exploit the full range of partial order reduction 
techniques already validated for untimed systems. This leads to an exact and 
broadly applicable reduction technique, which we shall demonstrate on a series of 
industrial case studies showing significant space and time reduction. In order to 
highlight the generality of the approach, we first describe our reduction technique 
in the setting of timed labelled transition systems. We shall then instantiate it to 
timed-arc Petri nets and implement and experimentally validate it in the model 
checker TAPAAL [19]. 

Let us now briefly introduce the model of timed-arc Peri nets and explain 
our reduction ideas. In timed-arc Petri nets, each token is associated with a 
nonnegative integer representing its age and input arcs to transitions contain 
intervals, restricting the ages of tokens available for transition firing (if an interval 
is missing, we assume the default interval [0, oo] that accepts all token ages). In 
Fig.la we present a simple monitoring system modelled as a timed-arc Petri 
net. The system consists of two identical sensors where sensor i, i € {1,2}, is 
represented by the places b; and m;, and the transitions s; and r;. Once a token 
of age 0 is placed into the place b;, the sensor gets started by executing the 
transition s; and moving the token from place b; to m; where the monitoring 
process starts. As the place b; has an associated age invariant < 0, meaning that 
all tokens in b; must be of age at most 0, no time delay is allowed and the firing 
of s; becomes urgent. In the monitoring place m; we have to delay one time unit 
before the transition r; reporting the reading of the sensor becomes enabled. 
Due to the age invariant < 1 in the place m;, we cannot wait longer than one 
time unit, after which r; becomes also urgent. 

The places c1, cz and cs together with the transitions 21, 49 and t are used to 
control the initialization of the sensors. At the execution start, only the transition 
i; is enabled and because it is an urgent transition (denoted by the white circle), 
no delay is initially possible and i, must be fired immediately while removing 
the token of age 0 from cı and placing a new token of age 0 into cg. At the 
same time, the first sensor gets started as 7; also places a fresh token of age 0 
into bı. Now the control part of the net can decide to fire without any delay the 
transition i9 and start the second sensor, or it can delay one unit of time after 
which 49 becomes urgent due to the age invariant < 1 as the token in c9 is now 
of age 1. If i2 is fired now, it will place a fresh token of age 0 into b2. However, 
the token that is moved from c2 to c3 by the pair of transport arcs with the 
diamond-shaped arrow tips preserves its age 1, so now we have to wait precisely 
one more time unit before t becomes enabled. Moreover, before t can be fired, 
the places mı and m2 must be empty as otherwise the firing of t is disabled due 
to inhibitor arcs with circle-shaped arrow tips. 

In Fig.1b we represent the reachable state space of the simple monitoring 
system where markings are represented using the notation like c3 : 1-- 55 : 2 that 
stands for one token of age 1 in place c3 and one token of age 2 in place bz. The 
dashed boxes represent the markings that can be avoided during the state space 
exploration when we apply our partial order reduction method for checking if 
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(a) TAPN model of a simple monitoring system 
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(b) Reachable state space generated by the net in Figure la 


Fig. 1. Simple monitoring system 


the termination transition t can become enabled from the initial marking. We 
can see that the partial order reduction is applied such that it preserves at least 
one path to all configurations where our goal is reached (transition t is enabled) 
and where time is not urgent anymore (i.e. to the configurations that allow the 
delay of 1 time unit). The basic idea of our approach is to apply the stubborn 
set reduction on the commutative diamonds where time is not allowed to elapse. 


Related Work. Our stubborn set reduction is based on the work of Valmari et 
al. [28,39]. We formulate their stubborn set method in the abstract framework of 
labelled transition systems with time and add further axioms for time elapsing 
in order to guarantee preservation of the reachability properties. 
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For Petri nets, Yoneda and Schlingloff [41] apply a partial order reduction 
to one-safe time Petri nets, however, as claimed in [38], the method is mainly 
suitable for small to medium models due to a computational overhead, confirmed 
also in [29]. The experimental evaluation in [41] shows only one selected exam- 
ple. Sloan and Buy [38] try to improve on the efficiency of the method, at the 
expense of considering only a rather limited model of simple time Petri nets 
where each transition has a statically assigned duration. Lilius [29] suggests to 
instead use alternative semantics of timed Petri nets to remove the issues related 
to the global nature of time, allowing him to apply directly the untimed partial 
order approaches. However, the semantics is nonstandard and no experiments 
are reported. Another approach is by Virbitskaite and Pokozy [40], who apply 
a partial order method on the region graph of bounded time Petri nets. Region 
graphs are in general not an efficient method for state space representation and 
the method is demonstrated only on a small buffer example with no further 
experimental validation. Recently, partial order techniques were suggested by 
André et al. for parametric time Petri nets [5], however, the approach is working 
only for safe and acyclic nets. Boucheneb and Barkaoui [12-14] discuss a partial 
order reduction technique for timed Petri nets based on contracted state class 
graphs and present a few examples on a prototype implementation (the authors 
do not refer to any publicly available tool). Their method is different from ours as 
it aims at adding timing constrains to the independence relation, but it does not 
exploit urgent behaviour. Moreover, the models of time Petri nets and timed-arc 
Petri nets are, even on the simplest nets, incomparable due to the different way 
to modelling time. 

'The fact that we are still lacking a practically applicable method for the time 
Petri net model is documented by a missing implementation of the technique in 
leading tools for time Petri net model checking like TINA [9] and Romeo [22]. 
We are not aware of any work on partial order reduction technique for the class 
of timed-arc Petri nets that we consider in this paper. This is likely because 
this class of nets provides even more complex timing behaviour, as we consider 
unbounded nets where each token carries its timing information (and needs a 
separate clock to remember the timing), while in time Petri nets timing is asso- 
ciated only to a priory fixed number of transitions in the net. 

In the setting of timed automata [3], early work on partial order reduction 
includes Bengtsson et al. [8] and Minea [32] where they introduce the notion 
of local as well as global clocks but provide no experimental evaluation. Dams 
et al. [18] introduce the notion of covering in order to generalize dependencies 
but also here no empirical evaluation is provided. Lugiez, Niebert et al. [30,34] 
study the notion of event zones (capturing time-durations between events) and 
use it to implement Mazurkiewicz-trace reductions. Salah et al. [37] introduce 
and implement an exact method based on merging zones resulting from different 
interleavings. The method achieves performance comparable with the approx- 
imate convex-hull abstraction which is by now superseded by the exact LU- 
abstraction [7]. Most recently, Hansen et al. [23] introduce a variant of stubborn 
sets for reducing an abstracted zone graph, thus in general offering overapprox- 
imate analysis. Our technique is orthogonal to the other approaches mentioned 
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above; not only is the model different but also the application of our reduc- 
tion gives exact results and is based on new reduction ideas. Finally, the idea of 
applying partial order reduction for independent events that happen at the same 
time appeared also in [15] where the authors, however, use a static method that 
declares actions as independent only if they do not communicate, do not emit 
signals and do not access any shared variables. Our realization of the method to 
the case of timed-arc Petri nets applies a dynamic (on-the-fly) reduction, while 
executing a detailed timing analysis that allows us to declare more transitions 
as independent—sometimes even in the case when they share resources. 


2 Partial Order Reduction for Timed Systems 


We shall now describe the general idea of our partial order reduction technique 
(based on stubborn sets [28,39]) in terms of timed transition systems. We con- 
sider real-time delays in the rest of this section, as these results are not spe- 
cific only to discrete time semantics. Let A be a given set of actions such that 
An Ro = 9 where Rso stands for the set of nonnegative real numbers. 


Definition 1 (Timed Transition System). A timed transition system is a 
tuple (S, so, —5) where S is a set of states, so € S is the initial state, and —C 
S x (AU Rp>o) x S is the transition relation. 


If (s,a,s’) €— we write s 55 s'. We implicitly assume that if s 2, s! then 
s = s', i.e. zero time delays do not change the current state. The set of enabled 
actions at a state s € S is defined as En(s) E {a € A|3s' € S. s 5 s'). 


* 


Given a sequence of actions w = 010203... Qn € (AU Rso)* we write s Z, g 
iff s > ... 27. s, If there is a sequence w of length n such that s “> s’, we 
also write s >” s'. Finally, let —* be the reflexive and transitive closure of the 
relation — such that s — s’ iff there is a € R>o UA and s Ss. 

For the rest of this section, we assume a fixed transition system (S, so, —) 
and a set of goal states G C S. The reachability problem, given a timed transition 
system (S, so, >) and a set of goal states G, is to decide whether there is s’ € G 
such that sp —* s'. 

We now develop the theoretical foundations of stubborn sets for timed tran- 
sition systems. A state s € S is zero time if time can not elapse at s. We denote 
the zero time property of a state s by the predicate zt(s) and define it as zt(s) 


iff for all s' € S and all d € R>o if s 2, s' then d = 0. A reduction of a timed 

transition system is a function St : S — 2^. A reduction defines a reduced tran- 

sition relation aS such that s = s' iff s S s' and a € St(s) U R>o. For a 
t t m 

given state s € S we define St(s) aa \ St(s) as the set of all actions that are 


not in St(s). 


Definition 2 (Reachability Conditions). A reduction St on a timed transi- 
tion system (S, so, —5) is reachability preserving if it satisfies the following four 
conditions. 
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(Z) Vs € S. azt(s) = > En(s) C St(s) 

(D) Vs,s' € S. Vw € St(s) . zt(s) As — s! => zt(s’) 

(R) Vs, s! € S. vw € St(s) . zt(s) Ass ^sgG = s eG 

(W) Vs, s' € S. Vw € St(s) . Va € St(s). zt(s) ^s 2S s! — s E g 


Condition Z declares that in a state where a delay is possible, all enabled 
actions become stubborn actions. Condition D guarantees that in order to enable 
a time delay from a state where delaying is not allowed, a stubborn action 
must be executed. Similarly, Condition R requires that a stubborn action must 
be executed before a goal state can be reached from a non-goal state. Finally, 
Condition W allows us to commute stubborn actions with non-stubborn actions. 
'The following theorem shows that reachability preserving reductions generate 
pruned transition systems where the reachability of goal states is preserved. 


Theorem 1 (Shortest-Distance Reachability Preservation). Let St be 
a reachability preserving reduction satisfying Z, D, 'R. and W. Let s € S. If 
s —" s! for some s' € G then also s = s” for some s" € G where m € n. 

t 


Proof. We proceed by induction on n. Base step. If n = 0, then s = s’ and 
m = n = 0. Inductive step. Let sy 2> sı T Sn+1 Where so d G 
"T Sn+1 € G. Without loss of generality we assume that for all i, 0 < i € m, 
we have a; Æ 0 (otherwise we can simply skip these 0-delay actions and get a 
shorter sequence). We have two cases. Case —zt(so): by condition Z we have 


En(so) C St(so) and by the definition of zwe have so ex sı since Qo € 
t t 


En(so) U Rso. By the induction hypothesis we have sı zx s" with s" € G 
ce t 


and m < n and m -F1 < n 4 1. Case zt(so): let w = aoo1...o, and o; be 
such that o; € St(so) and for all k < i holds that o; Z St(so), i.e. a; is the 
first stubborn action in w. Such an o; has to exist otherwise s,+1 € G due to 
condition R. Because of condition D we get zt(s;,) for all k, 0 € k < i, otherwise 
o; cannot be the first stubborn action in w. We can split w as w = uo;v with 
uc St(so) . Since all states in the path to s; are zero time, by YV we can swap 

on | u v Pol è Qi 1 
Qj aS $9 —> s, — sj > 8’ with |uv| = n. Since o; € St(so) we get so PES 
and by the induction hypothesis we have sí rr s" where s" € G, m < n, and 
mt+i<n+l1. 


3 Timed-Arc Petri Nets 


We shall now define the model of timed-arc Petri nets (as informally described in 
the introduction) together with a reachability logic and a few technical lemmas 
needed later on. Let No = NU {0} and Nj? = No U {oo}. We define the set of 


well-formed closed time intervals as T E ([a, b] | a € No,b € Ng*, a € b} and its 
subset inv del {[0, b] | 6 € Ng? ) used in age invariants. 


Definition 3 (Timed-Arc Petri Net). A timed-arc Petri net (TAPN) is a 
9-tuple N = (P, T, Turg, IA, OA, g, w, Type, I) where 
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— P is a finite set of places, 
— T is a finite set of transitions such that POT = 90, 
- Turg C T is the set of urgent transitions, 
- IAC P x T is a finite set of input arcs, 
- OACT x P is a finite set of output arcs, 
- g: ĪA — T is a time constraint function assigning guards (time intervals) to 
input arcs s.t. 
e if (p,t) € IA and t € Turg then g((p,t)) = [0, oo], 
- w: IAU OA — N is a function assigning weights to input and output arcs, 
- Type : IAU OA — Types is a type function assigning a type to all arcs where 
Types = (Normal, Inhib} U ( Transport; | j € N} such that 
e if Type(z) — Inhib then z € IA and g(z) — [0, oc], 
e if Type((p,t)) = Transport; for some (p,t) € IA then there is exactly one 
(t, p') € OA such that Type((t, p')) = Transport ;, 
if Type((t,p’)) = Transport; for some (t,p') € OA then there is exactly 
one (p,t) € IA such that Type((p,t)) = Transport, 
e if Type((p,t)) = Transport; = Type((t,p’)) then w((p,t)) = w((t,p’)), 
- I: P — T" is a function assigning age invariants to places. 


Note that for transport arcs we assume that they come in pairs (for each 
type Transport;) and that their weights match. Also for inhibitor arcs and for 
input arcs to urgent transitions, we require that the guards are [0, oo]. 


Before we give the formal semantics of the model, let us fix some notation. 


Let N = (P,T,Tuurg, IA, OA, g, w, Type, I) be a TAPN. We denote by *z © {y € 


PUT | (y,x) € IAU OA, Type((y,x)) Æ Inhib) the preset of a transition or a 


place x. Similarly, the postset is defined as x° E {y € PUT | (x,y) € (ITAU OA)}. 


We denote by ?t E {p € P | (p.t) € IAA Type((p,t)) = Inhib} the inhibitor 
preset of a transition t. The inhibitor postset of a place p is defined as p? Ea 
{t € T | (p,t) € IAA Type((p,t)) = Inhib}. Let B(IRZ9) be the set of all finite 
multisets over IRZ?. A marking M on N is a function M : P — B(IR29?) where 
for every place p € P and every token x € M(p) we have x € I(p), in other 
words all tokens have to satisfy the age invariants. The set of all markings in a 
net N is denoted by M(N). 

We write (p, £) to denote a token at a place p with the age x € IRZ9. Then 
M = {(p1, x1), (po, 22), .... (pu, t5)] is a multiset representing a marking M 
with n tokens of ages x; in places p;. We define the size of a marking as |M| = 
prep M (p)| where |M (p)| is the number of tokens located in the place p. A 
marked TAPN (N, Mo) is a TAPN N together with an initial marking Mo with 
all tokens of age 0. 


Definition 4 (Enabledness). Let N = (P,T,Turg, IA, OA, g, w, Type, I) be 
a TAPN. We say that a transition t € T is enabled in a marking M by the 
multisets of tokens In = (pai), (p.22)... (p, gi. | p € *t} € M and 


Out = (v xL) (y^ a2)... (p mp?) | p! e t*) if 
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— for all input arcs except the inhibitor arcs, the tokens from In satisfy the age 
guards of the arcs, i.e. 


Vp c *t. a, € g((p,t)) for 1 € i € w((p,t)) 


— for any inhibitor arc pointing from a place p to the transition t, the number 
of tokens in p is smaller than the weight of the arc, i.e. 


V(p,t) € IA. Type((p,t)) = Inhib = |M(p)| < w((p,t)) 


— for all input arcs and output arcs which constitute a transport arc, the age of 
the input token must be equal to the age of the output token and satisfy the 
invariant of the output place, i.e. 


V(p, t) € IA.V(t, p") € OA. Type((p, t)) = Type((t, p')) = Transport, 


=> (a = x, A E € I(y/)) for1<i<w((p,t)) 


— for all normal output arcs, the age of the output token is 0, i.e. 
V(t, p") € OA. Type((t, p')) = Normal = aj, 20 for 1 < i < w((t,p’)). 
def 


A given marked TAPN (N, Mo) defines a timed transition system T(N) = 
(M(N), Mo, —) where the states are markings and the transitions are as follows. 


— If t € T is enabled in a marking M by the multisets of tokens In and Out 
then t can fire and produce the marking M' = (M « In) & Out where W is 
the multiset sum operator and ~ is the multiset difference operator; we write 
M + M' for this action transition. 

— A time delay d € No is allowed in M if 

e (x+ d) € I(p) for all p € P and all x € M(p), i.e. by delaying d time 
units no token violates any of the age invariants, and 
e if M Ċ& M' for some t € Turg then d = 0, i.e. enabled urgent transitions 
disallow time passing. 
By delaying d time units in M we reach the marking M' defined as M'(p) — 


[r-c-d|x € M(p)) for all p € P; we write M £, M' for this delay transition. 


Note that the semantics above defines the discrete-time semantics as the 
delays are restricted to nonnegative integers. It is well known that for timed-arc 
Petri nets with nonstrict intervals, the marking reachability problem on discrete 
and continuous time nets coincide [31]. This is, however, not the case for more 
complex properties like liveness that can be expressed in the CTL logic (for 
counter examples that can be expressed in CTL see e.g. [25]). 


3.1 Reachability Logic and Interesting Sets of Transitions 


We now describe a logic for expressing the properties of markings based on the 
number of tokens in places and transition enabledness, inspired by the logic 
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Table 1. Interesting transitions of y (assuming M [Æ y, otherwise Aw (v) = 0) 


Formula y | Aw (c) Anu (^9) 

deadlock | (°t)? U * (^t) for some t € En(M) 0 

t *p for some p € *t where M(p) < w((p,t)) or CEH U(t) 
p° for some p € °t where M (p) > w((p,t)) 

€1 < e2 decr m (e1) U incr m (ea) Am(e1 > e2) 

€1 € e3 decr m (e1) U iner m (ea) Am(e1 > ea) 

€1 > €2 incr y (e1) U decr m (e2) Am(e1 € ex) 

e1 > e incr m (e1) U decr m (e2) Am(e1 < ex) 

e1 = e2 decr m (e1) U incr m (e2) if evalm (e1) > evalm (e2) | Am (e1 F ea) 
incr m (e1) U decr m (e2) if evalm (e1) < evalm (e2) 

e1 € e2 incr m (e1) U decr m (e1) U incr m (e2) U decr m (e2) | Am (e1 = ex) 

pı A Q2 Am (yi) for some i € {1,2} where M [£ 9; Ay (71 V ^92) 

gı V p2 Am(¢1) U Am (2) Am (791 ^92) 


Table 2. Increasing and decreasing transitions of expression e 


Expression e | incr m (e) decr y (e) 

c ) 0 

p "p p° 

€1 +e incrm (e1) U incrm(e2) | decr m (e1) U decr m (e2) 

e1 — €3 incry (ei) U decr (ea) | deer (e1) U iner m (ea) 

e1 * €2 incr m (e1) U decr m (e1) U | incr m (e1) U decr m (e1) U 
incrm(e2) U decrm(e2) | incr m (ea) U decr m (ea) 


used in the Model Checking Contest (MCC) Property Language [27]. Let N = 
(P, T, Targ, IA, OA, g, w, Type, I) be a TAPN. The formulae of the logic are given 
by the abstract syntax: 
yp: = deadlock |t|e1 ba e2| p1 ^ yo| p1 V p2| 7% 


eii- 


c|p|ei 9 es 


where t € T, race {<,<,=,4,>,>},c€Z, pe P, and 6 € (4,—,*]. Let & be 
the set of all such formulae and let Ew be the set of arithmetic expressions over 
the net N. The semantics of y in a marking M € M(N) is given by 


M H deadlock if En(M) =0 
MEt ift € En(M) 
ME €1 P4 €5 


if eval y (e1) & eval (es) 


assuming a standard semantics for Boolean operators and where the semantics 
of arithmetic expressions in a marking M is as follows: eval (c) = c, eval m (p) = 
|M(p)|, and eval (ei © e2) = evalm (e1) 8 eval ys (ez). 
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Let y be a formula. We are interested in the question, whether we can reach 
from the initial marking some of the goal markings from Gy = (M € M(N) | 
M E v). In order to guide the reduction such that transitions that lead to the 
goal markings are included in the generated stubborn set, we define the notion 
of interesting transitions for a marking M relative to vy, and we let Aw (y) C T 
denote the set of interesting transitions. Formally, we shall require that whenever 
M = M' via a sequence of transitions w = t1t2...t, € T* where M ¢ Gy and 
M' € Gg, then there must exist i, 1 < à € n, such that t; € Ays(y). 

Table 1 gives a possible definition of Am (vy). Let us remark that the definition 
is at several places nondeterministic, allowing for a variety of sets of interesting 
transitions. Table 1 uses the functions incry : En — 27 and decry : En — 27 
defined in Table2. These functions take as input an expression e, and return all 
transitions that can possibly, when fired, increase resp. decrease the evaluation 
of e. The following lemma formally states the required property of the functions 
incry and decry. 


Lemma 1. Let N = (P,T, Tus, IA, OA, g, w, Type, I) be a TAPN and M € 
M(N) a marking. Let e € Ey and let M “+ M' where w = tito ...t, € T*. 


- If evalys(e) < evalm (e) then there is à, 1 € d € n, such that t; € incr m(e). 
- If evalys(e) > evalm (e) then there is i, 1 € à € n, such that t; € decr y (e). 


We finish this section with the main technical lemma, showing that at least 
one interesting transition must be fired before we can reach a marking satisfying 
a given reachability formula. 


Lemma 2. Let N = (P,T,Turg, IA, OA, g, w, Type, I) be a TAPN, let M € 
M(N) be its marking and let y € © be a given formula. If M |£ p and M — M' 
where w € Aulo) then M'  . 


4 Partial Order Reductions for TAPN 


We are now ready to state the main theorem that provides sufficient syntax- 
driven conditions for a reduction in order to guarantee preservation of reacha- 
bility. Let N = (P, T, Turg, IA, OA, g, w, Type, I) bea TAPN, let M € M(N) be 
a marking of N, and let o € 9 be a formula. We recall that Ajs(y) is the set of 
interesting transitions as defined earlier. 


Theorem 2 (Reachability Preserving Closure). Let St be a reduction such 
that for all M € M(N) it satisfies the following conditions. 


1 If azt(M) then En(M) C St(M). 
2 If zt(M) then Ay(q) € St(M). 
3 If zt(M) then either 
(a) there is t € Turg A En(M) n St(M) where *(?t) C St(M), or 
(b) there is p € P where I(p) = [a,b] and b € M(p) such that t € St(M) for 
every t € p° where b € g((p, t)). 
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(a) Transitions t; and t2 can disable (b) Transition t2 can remove the token 
resp. inhibit the urgent transition t of age 5 from p 


Fig. 2. Cases for Condition 3 


4 For all t € St( M) NV En(M) either 
(a) there is p € *t such that ie € M(p)|x€ gp, t))}| < w((p,t)) a 
- t' € St(M) for all t' € *p where there is p' € *t' with A p)) = 
Type((p',t')) = Transport; and where g((p',t’)) N g((p,t)) Æ 0, and 
- if 0 € g((p,t)) then also °p C St(M), or 
(b) there is p € °t where |M(p)| > w((p,t)) such that 
- t' € St(M) for all t € p° where M(p) N g((p,t')) £ 0. 
5 For all t € St(M) n En(M) we have 
(a) t € St(M) for every t' € p° where p € °t and g((p,t)) O g((p,t)) z 0, 
and 
(b) (E)? c St(M), 
Then St satisfies Z, D, R, and W. 


Let us now briefly discuss the conditions of Theorem 2. Clearly, Condition 1 
ensures that if time can elapse, we include all enabled transitions into the stub- 
born set and Condition 2 guarantees that all interesting transitions (those that 
can potentially make the reachability proposition true) are included as well. 

Condition 3 makes sure that if time elapsing is disabled then any transition 
that can possibly enable time elapsing will be added to the stubborn set. There 
are two situations how time progress can be disabled. Either, there is an urgent 
enabled transition, like the transition t in Fig. 2a. Since tg can add a token to pə 
and by that inhibit t, Condition 3a makes sure that tə is added into the stubborn 
set in order to satisfy D. As tı can remove the token of age 3 from pı and hence 
disable t, we must add t4 to the stubborn set too (guaranteed by Condition 5a). 
'The other situation when time gets stopped is when a place with an age invariant 
contains a token that disallows time passing, like in Fig. 2b where time is disabled 
because the place p has a token of age 5, which is the maximum possible age of 
tokens in p due to the age invariant. Since t2 can remove the token of age 5 from 
p, we include it to the stubborn set due to Condition 3b. On the other hand t4 
does not have to be included in the stubborn set as its firing cannot remove the 
token of age 5 from p. 

Condition 4 makes sure that an disabled stubborn transition can never be 
enabled by a non-stubborn transition. There are two reasons why a transition is 
disabled. Either, as in Fig. 3a where t is disabled, there is an insufficient number 
of tokens of appropriate age to fire the transition. In this case, Condition 4a 
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(a) Transition tı can transport well- (b) Transition tı can enable t by re- 
aged tokens into p and enable t moving tokens from p 


Fig. 3. Cases for Condition 4 


(a) Stubborn transition ¢ can disable both tz and t3 


Fig. 4. Cases for Condition 5 


makes sure that transitions that can add tokens of a suitable age via transport 
arcs are included in the stubborn set. This is the case for the transition tı in our 
example, as [2,5] has a nonempty intersection with [4,6]. On the other hand, ts 
does not have to be added. As the transition t2 only adds fresh tokens of age 0 
to p via normal arcs, there is no need to add t» into the stubborn set either. The 
other reason for a transition to be disabled is due to inhibitor arcs, as shown 
on the transition t in Fig. 3b. Condition 4b makes sure that tı is added to the 
stubborn set, as it can enable t (the interval [6, 8] has a nonempty intersection 
with the tokens of age 6 and 7 in the place p). As this is not the case for t», this 
transition can be left out from the stubborn set. 

Finally, Condition 5 guarantees that enabled stubborn transitions can never 
disable any non-stubborn transitions. For an illustration, take a look at Fig. 4a 
and assume that t is an enabled stubborn transition. Firing of t can remove 
the token of age 4 from p and disable t2, hence t2 must become stubborn by 
Condition 5a in order to satisfy YV. On the other hand, the intervals [6, 8] and 
[2,5] have empty intersection, so there is no need to declare tı as a stubborn 
transition. Moreover, firing of t can also disable the transition t3 due to the 
inhibitor arc, so we must add £3 to the stubborn set by Condition 5b. 

The conditions of Theorem 2 can be turned into an iterative saturation algo- 
rithm for the construction of stubborn sets as shown in Algorithm 1. When 
running this algorithm for the net in our running example, we can reduce the 
state space exploration for fireability of the transition t as depicted in Fig. 1b. 
Our last theorem states that the algorithm returns stubborn subsets of enabled 
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Algorithm 1. Construction of a reachability preserving stubborn set 


N e 


o 0 Nou fh Ww 


10 
11 
12 
13 


14 
15 
16 
17 
18 
19 
20 
21 


22 


23 
24 


25 
26 
27 
28 
29 


30 
31 
32 


33 


34 
35 


36 


input : N = (P,T,Turg, IA, OA, g, w, Type, I), M € M(N), geo 
output : St(M) N En( M) 
if =zt(M) then 

return En(M); 


X := 0; Y := Am(y); 
if Tur N En(M) Z 0 then 
pick any t € Turg N En(M); 
if t d Y then 
B Y= Y Uth 
Y= YU" Cb); 
else 
pick any p € P where I(p) = [a,b] and b € M(p) 
forall t € p° do 


if b € g((p, t)) then 
| | Y= Y UHA 


while Y 4 () do 

pick any t€ Y; 

if t d En(M) then 

if 3pe “t. [fa € M(y) | z € g((p,t))}| < w((p,t)) then 

pick any such p; 

forall t € °p \ X do 

forall p' € *t' do 
if Type((t',p)) = Type((p', t’)) = 
Transport, ^ g((p',t')) N v. t)) £0 then 

B Y= Ut}: 


if 0 € g((p,t)) then 
| ¥:=YU(p\ X); 


else 
pick any p € ^t s.t. |M(p)| > w((p,t)); 
forall t' € p° V X do 
| if M(p) N g((p,t')) 40 then 
L Y := YU {t'}; 


else 
forall p € °t do 


| Y: Yu(t € p°|g((p,t)) n g((p,.t)) z 0} V X); 
| Y:-Yu(t X) 
Y :=Y\ {t} 
| X:=XU {t}; 


return X  En(M); 
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transitions that satisfy the four conditions of Theorem 1 and hence we preserve 
the reachability property as well as the minimum path to some reachable goal. 


Theorem 3. Algorithm 1 terminates and returns St(M) N En(M) for some 
reduction St that satisfies Z, D, R, and W. 


5 Implementation and Experiments 


We implemented our partial order method in C++ and integrated it within the 
model checker TAPA AL [19] and its discrete time engine verifydtapn [4,11]. 
We evaluate our partial order reduction on a wide range of case studies. 
PatientMonitoring. The patient monitoring system [17] models a medical 
system that through sensors periodically scans patient's vital functions, making 
sure that abnormal situations are detected and reported within given deadlines. 
The timed-arc Petri net model was described in [17] for two sensors monitoring 
patient's pulse rate and oxygen saturation level. We scale the case study by 
adding additional sensors. BloodTransfusion. This case study models a larger 
blood transfusion workflow [16], the benchmarking case study of the little-JIL 
language. The timed-arc Petri net model was described in [10] and we verify that 
the workflow is free of deadlocks (unless all sub-workflows correctly terminate). 
The problem is scaled by the number of patients receiving a blood transfusion. 
FireAlarm. This case study uses a modified (due to trade secrets) fire alarm 
system owned by a German company [20,21]. It models a four-channel round- 
robin frequency-hopping transmission scheduling in order to ensure a reliable 
communication between a number of wireless sensors (by which the case study 
is scaled) and a central control unit. The protocol is based on time-division 
multiple access (TDMA) channel access and we verify that for a given frequency- 
jammer, it takes never more than three cycles before a fire alarm is communicated 
to the central unit. BAwPC. Business Activity with Participant Completion 
(BAwPC) is a web-service coordination protocol from WS-BA specification [33] 
that ensures a consistent agreement on the outcome of long-running distributed 
applications. In [26] it was shown that the protocol is flawed and a correct, 
enhanced variant was suggested. We model check this enhanced protocol and 
scale it by the capacity of the communication buffer. Fischer. Here we consider 
a classical Fischer's protocol for ensuring mutual exclusion for a number of timed 
processes. The timed-arc Petri net model is taken from [2] and it is scaled by the 
number of processes. LynchShavit. This is another timed-based mutual exclusion 
algorithm by Lynch and Shavit, with the timed-arc Petri net model taken from [1] 
and scaled by the number of processes. MPEG2. This case study describes the 
workflow of the MPEG-2 video encoding algorithm run on a multicore processor 
(the timed-arc Petri net model was published in [35]) and we verify the maximum 
duration of the workflow. The model is scaled by the number of B frames in the 
IB"P frame sequence. AlternatingBit. This is a classical case study of alternating 
bit protocol, based on the timed-arc Petri net model given in [24]. The purpose 
of the protocol is to ensure a safe communication between a sender and a receiver 
over an unreliable medium. Messages are time-stamped in order to compensate 
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Table 3. Experiments with and without partial order reduction (POR) 
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Time (seconds) Markings x 1000 Reduction 
Model NORMAL | POR NORMAL | POR %Time | %Markings 
Patient Monitoring 3 5.88 0.35 333 28 | 94 92 
PatientMonitoring 4 22.06 0.48 1001 36 | 98 96 
PatientMonitoring 5 80.76 0.65 3031 44 | 99 99 
PatientMonitoring 6 305.72 0.85 9248 54 |100 99 
PatientMonitoring 7 5516.93 5.75 | 130172 318 | 100 100 
Blood' Transfusion 2 0.32 0.41 48 43 |—28 11 
BloodTransfusion 3 7.88 6.45 792 546 | 18 31 
BloodTransfusion 4 225.18 | 109.30 14904 7564 | 51 49 
Blood' Transfusion 5 5256.01 | 1611.14 | 248312 94395 | 69 62 
FireAlarm 10 28.95 14.17 796 498 | 51 37 
FireAlarm 12 116.97 17.51 1726 526 | 85 70 
FireAlarm 14 598.89 21.65 5367 554 | 96 90 
FireAlarm 16 5029.25 29.48 19845 582 | 99 97 
FireAlarm 18 27981.90 34.55 77675 610 | 100 99 
FireAlarm 20 154495.29 41.47 | 308914 638 | 100 100 
FireAlarm 80 >2 days 602.71 — 1522 — — 
FireAlarm 125 >2 days 1957.00 — 2260 — = 
BAwPC 2 0.21 0.41 19 16 |—95 15 
BAwPC 4 3.45 4.04 193 125 |-17 35 
BAwPC 6 23.01 17.08 900 452 | 26 50 
BAwPC 8 73.73 39.29 2294 952 | 47 58 
BAwPC 10 135.62 60.66 3819 1412 | 55 63 
BAwPC 12 173.09 73.53 4736 1665 | 58 65 
Fischer-9 3.24 2.37 281 233 | 27 17 
Fischer-11 12.68 8.73 923 738 | 31 20 
Fischer-13 42.52 28.53 2628 2041 | 33 22 
Fischer-15 121.31 77.50 6700 5066 | 36 24 
Fischer-17 313.69 | 198.36 15622 11536 | 37 26 
Fischer-19 748.52 | 456.30 33843 24469 | 39 28 
Fischer-21 1622.69 | 985.07 | 68934 48904 | 39 29 
LynchShavit 9 3.98 3.31 282 234 | 17 17 
LynchShavit 11 15.73 12.19 925 740 | 23 20 
LynchShavit 13 51.08 37.97 2631 2043 | 26 22 
LynchShavit 15 146.63 | 103.63 6703 5069 | 29 24 
LynchShavit 17 384.52 | 258.09 15626 11540 | 33 26 
LynchShavit 19 907.60 | 597.68 33848 24474 | 34 28 
LynchShavit 21 2011.58 | 1307.72 | 68940 48910 | 35 29 
MPEG2 3 13.17 15.43 2188 2187 |-17 0 
MPEG2 4 109.62 | 125.45 15190 15180 |-14 0 
MPEG2 5 755.54 | 840.84 87568 87478 |-11 0 
MPEG2 6 4463.19 | 5092.58 | 435023 434354 |—14 0 
AlternatingBit 20 9.17 9.51 617 617 | —4 0 
AlternatingBit 30 48.20 49.13 2804 2804 | —2 0 
AlternatingBit 40 161.18 | 162.94 8382 8382 | —1 0 
AlternatingBit 50 408.34 | 408.86 19781 19781 0 0 
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(via retransmission) for the possibility of losing messages. The case study is 
scaled by the maximum number of messages in transfer. 

All experiments were run on AMD Opteron 6376 Processors with 500 GB 
memory. In Table3 we compare the time to verify a model without (NORMAL) 
and with (POR) partial order reduction, the number of explored markings (in 
thousands) and the percentage of time and memory reduction. We can observe 
clear benefits of our technique on PatientMonitoring, Blood Transfusion and Fire- 
Alarm where we are both exponentially faster and explore only a fraction of all 
reachable markings. For example in FireAlarm, we are able to verify its cor- 
rectness for all 125 sensors, as it is required by the German company [21]. This 
would be clearly unfeasible without the use of partial order reduction. 

In BAwPC, we can notice that for the smallest instances, there is some 
computation overhead from computing the stubborn sets, however, it clearly 
pays off for the larger instances where the percentages of reduced state space are 
closely followed by the percentages of the verification times and in fact improve 
with the larger instances. Fischer and LynchShavit case studies demonstrate that 
even moderate reductions of the state space imply considerable reduction in the 
running time and computing the stubborn sets is well worth the extra effort. 

MPEG2 is an example of a model that allows only negligible reduction of 
the state space size, and where we observe an actual slowdown in the running 
time due to the computation of the stubborn sets. Nevertheless, the overhead 
stays constant in the range of about 1596, even for increasing instance sizes. 
Finally, AlternatingBit protocol does not allow for any reduction of the state 
space (even though it contains age invariants) but the overhead in the running 
time is negligible. 

We observed similar performance of our technique also for the cases where 
the reachability property does not hold and a counter example can be generated. 


6 Conclusion 


We suggested a simple, yet powerful and application-ready partial order reduc- 
tion for timed systems. The reduction comes into effect as soon as the timed sys- 
tem enters an urgent configuration where time cannot elapse until a nonempty 
sequence of transitions gets executed. The method is implemented and fully inte- 
grated, including GUI support, into the open-source tool TAPAAL. We demon- 
strated its practical applicability on several case studies and conclude that com- 
puting the stubborn sets causes only a minimal overhead while providing large 
benefits for reducing the state space in numerous models. The method is not 
specific to stubborn reduction technique only and it preserves the shortest exe- 
cution sequences. Moreover, once the time gets urgent, other classical (untimed) 
partial order approaches should be applicable too. Our method was instantiated 
to (unbounded) timed-arc Petri nets with discrete time semantics, however, we 
claim that the technique allows for general application to other modelling for- 
malisms like timed automata and timed Petri nets, as well as an extension to 
continuous time. We are currently working on adapting the theory and providing 
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an efficient implementation for UPPAAL-style timed automata with continuous 
time semantics. 


Acknowledgements. We thank Mads Johannsen for his help with the GUI support 
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Abstract. We consider the problem of monitoring a Linear Time Logic 
(LTL) specification that is defined on infinite paths, over finite traces. 
For example, we may need to draw a verdict on whether the system 
satisfies or violates the property “p holds infinitely often.” The problem 
is that there is always a continuation of a finite trace that satisfies the 
property and a different continuation that violates it. 

We propose a two-step approach to address this problem. First, we 
introduce a counting semantics that computes the number of steps to 
witness the satisfaction or violation of a formula for each position in the 
trace. Second, we use this information to make a prediction on incon- 
clusive suffixes. In particular, we consider a good suffix to be one that 
is shorter than the longest witness for a satisfaction, and a bad suffix to 
be shorter than or equal to the longest witness for a violation. Based on 
this assumption, we provide a verdict assessing whether a continuation 
of the execution on the same system will presumably satisfy or violate 
the property. 


1 Introduction 


Alice is a verification engineer and she is presented with a new exciting and com- 
plex design. The requirements document coming with the design already incor- 
porates functional requirements formalized in Linear Temporal Logic (LTL) [13]. 
'The design contains features that are very challenging for exhaustive verification 
and her favorite model checking tool does not terminate in reasonable time. 
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Runtime Verification. Alice decides to tackle this problem using runtime verifica- 
tion (RV) [3], a light, yet rigorous verification method. RV drops the exhaustive- 
ness of model checking and analyzes individual traces generated by the system. 
Thus, it scales much better to the industrial-size designs. RV enables automatic 
generation of monitors from formalized requirements and thus provides a sys- 
tematic way to check if the system traces satisfy (violate) the specification. 


Motivating Example. In particular, Alice considers the following specification: 
i) = G(request — F grant) 


This LTL formula specifies that every request coming from the environment must 
be granted by the design in some finite (but unbounded) future. Alice realizes 
that she is trying to check a liveness property over a set of finite traces. She 
looks closer at the executions and identifies the two interesting examples trace 
Tı and trace 75, depicted in Table 1. 

The monitoring tool reports that both T1 Table 1. Unbounded response prop- 
and 72 presumably violate the unbounded erty example. 
response property. This verdict is against 


trace | time 1|2|]3|4/|5 |6]|7 
Alice’s intuition. The evaluation of trace 7i "m Teaser TUE 
seems right to her — the request at Cycle 1 is grant T 
followed by a grant at Cycle 3, however the rə  |request == SmE 
request at Cycle 4 is never granted during grant T T 
that execution. There are good reasons to We use “—” instead of “1” to improve 
suspect a bug in the design. Then she looks "^ "ace readability. 
at rj and observes that after every request the grant is given exactly after 2 


cycles. It is true that the last request at Cycle 7 is not followed by a grant, but 
this seems to happen because the execution ends at that cycle — the past trace 
observations give reason to think that this request would be followed by a grant 
in cycle 9 if the execution was continued. Thus, Alice is not satisfied by the 
second verdict. 

Alice looks closer at the way that the LTL property is evaluated over finite 
traces. She finds out that temporal operators are given strength — eventually and 
until are declared as strong operators, while always and weak until are defined to 
be weak [9]. A strong temporal operator requires all outstanding obligations to be 
met before the end of the trace. In contrast, a weak temporal operator must not 
witness any outstanding obligation violation before the end of the trace. Under 
this interpretation, both 7; and Tə violate the unbounded response property. 

Alice explores another popular approach to evaluate future temporal prop- 
erties over finite traces — the 3-valued semantics for LTL [4]. In this setting, the 
Boolean set of verdicts is extended with a third unknown (or maybe) value. A 
finite trace satisfies (violates) the 3-valued LTL formula if and only if all the 
infinite extensions of the trace satisfy (violate) the same LTL formula under its 
classical interpretation. In all other cases, we say that the satisfaction of the 
formula by the trace is unknown. Alice applies the 3-valued interpretation of 
LTL on the traces 7, and 7» to evaluate the unbounded response property. In 
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both situations, she ends up with the unknown verdict. Once again, this is not 
what she expects and it does not meet her intuition about the satisfaction of the 
formula by the observed traces. 

Alice desires a semantics that evaluates LTL properties on finite traces by 
taking previous observations into account. 


Contributions. In this paper, we study the problem of LTL evaluation over finite 
traces encountered by Alice and propose a solution. We introduce a new count- 
ing semantics for LTL that takes into account the intuition illustrated by the 
example from Table 1. This semantics computes for every position of a trace two 
values — the distances to the nearest satisfaction and violation of the co-safety, 
respectively safety, part of the specification. We use this quantitative information 
to make predictions about the (infinite) suffixes of the finite observations. We 
infer from these values the maximum time that we expect for a future obligation 
to be fulfilled. We compare it to the value that we have for an open obligation 
at the end of the trace. If the latter is greater (smaller) than the expected max- 
imum value, we have a good indication of a presumed violation (satisfaction) 
that we report to the user. In particular, our approach will indicate that 7, is 
likely to violate the specification and should be further inspected. In contrast, it 
will evaluate that o most likely satisfies the unbounded response property. 


Organization of the Paper. The rest of the paper is organized as follows. We 
discuss the related work in Sect. 2 and we provide the preliminaries in Sect. 3. 
In Sect.4 we present our new counting semantics for LTL and we show how 
to make predictions about (infinite) suffixes of the finite observations. Section 5 
shows the application of our approach to some examples. Finally in Sect.6 we 
draw our conclusions. 


2 Related Work 


The finitary interpretation of LTL was first considered in [11], where the authors 
propose to enrich the logic with the weak next operator that is dual to the 
(strong) next operator defined on infinite traces. While the strong next requires 
the existence of a next state, the weak next trivially evaluates to true at the end 
of the trace. In [9], the authors propose a more semantic approach with weak and 
strong views for evaluating future obligations at the end of the trace. In essence 
the empty word satisfies (violates) every formula according to the weak (strong) 
view. These two approaches result in the violation of the specification v» by both 
traces 7, and 7». 

The authors in [4] propose a 3-valued finitary LTL interpretation of LTL, in 
which the set {true, false} of verdicts is extended with a third inconclusive verdict. 
According to the 3-valued LTL, a finite trace satisfies (violates) a specification iff 
all its infinite extensions satisfy (violate) the same property under the classical 
LTL interpretation. Otherwise, it evaluates to inconclusive. The main disadvan- 
tage of the 3-valued semantics is the dominance of the inconclusive verdict in 
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the evaluation of many interesting LTL formulas. In fact, both 7, and 7) from 
Table 1 evaluate to inconclusive against the unbounded response specification v. 

In [5], the authors combine the weak and strong operators with the 3-valued 
semantics to refine the inconclusive with {presumably true, presumably false}. The 
strength of the remaining future obligation dictates the presumable verdict. The 
authors in [12] propose a finitary semantics for each of the LTL (safety, liveness, 
persistence and recurrence) hierarchy classes that asymptotically converges to 
the infinite traces semantics of the logic. In these two works, the specification v 
also evaluates to the same verdict for both the traces 7; and 725. 

To summarize, none of the related work handles the unbounded response 
example from Table 1 in a satisfactory manner. This is due to the fact that these 
approaches decide about the verdict based on the specification and its remaining 
future obligations at the end of the trace. In contrast, we propose an approach in 
which the past observations within the trace are used to predict the future and 
derive the appropriate verdict. In particular, the application of our semantics 
for the evaluation of w over 7, and Tə results in presumably true and presumably 
false verdicts. 

In [17], the authors propose another predictive semantics for LTL. In essence, 
this work assumes that at every point in time the monitor is able to precisely 
predict a segment of the trace that it has not observed yet and produce its 
outcome accordingly. In order to ensure such predictive power, this approach 
requires a white-box setting in which instrumentation and some form of static 
analysis of the systems are needed in order to foresee in advance the upcoming 
observations. This is in contrast to our work, in which the monitor remains a 
passive participant and predicts its verdict only based on the past observations. 

In a different research thread [15], the authors introduce the notion of moni- 
torable specifications that can be positively or negatively determined by a finite 
trace. The monitorability of LTL is further studied in [6,14]. This classifica- 
tion of specifications is orthogonal to our work. We focus on providing a sensible 
evaluation to all LTL properties, including the non-monitorable ones (e.g., GF p). 

We also mention the recent work on statistical model checking for LTL [8]. In 
this work, the authors assume a gray-box setting, where the system-under-test 
(SUT) is a Markov chain with the known minimum transition probability. This 
is in contrast to our work, in which we passively observe existing finite traces 
generated by the SUT, i.e., we have a blackbox setting. 

In [1], the authors propose extending LTL with a discounting operator and 
study the properties of the augmented logic. The LTL specification formalism 
is extended with path-accumulation assertions in [7]. These LTL extensions are 
motivated by the need for a more quantitative and refined analysis of the systems. 
In our work, the motivation for the counting semantics is quite different. We use 
the quantitative information that we collect during the execution of the trace to 
predict the future behavior of the system and thus improve the quality of the 
monitoring verdict. 
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3 Preliminaries 


We first introduce traces and Linear Temporal Logic (LTL) that we interpret 
over 3-valued semantics. 


Definition 1 (Trace). Let P a finite set of propositions and let IT = 2^. A 
(finite or infinite) trace m is a sequence 71,72,... € II* U II? . We denote by 
|n| € NU {co} the length of v. We denote by n- n’ the concatenation of x € II* 
and x! € II* UIT’. 


Definition 2 (Linear Temporal Logic). In this paper, we consider linear 
temporal logic (LTL) and we define its syntax by the grammar: 


$ :=p | = | d1 V d2 | Xo | 1 U és, 
where p € P. We denote by © the set of all LTL formulas. 


From the basic definition we can derive other standard Boolean and temporal 
operators as follows: 


T-pVop, L=-T, 6^v (76v 4), FO=TUG, Góc cF-ó 


Let m € II” be an infinite trace and ¢ an LTL formula. The satisfaction 
relation (7,i) E- $ is defined inductively as follows 


(m,i) Ep iff p € mi, 

(ip iff (7,1) F 4, 

(m,i) d Qı V $» iff (m,i) = Qı or (7, i) = Q2, 
(m, i) 

(m, i) 


)RXoO —idf(mic-l1) 4, 
yt = 1 U d2 iff 3j > i s.t. (m, j) = à» and Vi < k < j, (v, k) = ġı. 


We now recall the 3-valued semantics from [4]. We denote by [7 H3 ¢] the 
evaluation of ¢ with respect to the trace 7 € II* that yields a value in (T, 1, ?}. 


Vr € H”, r-n E, 
[n = d= 4L Vs € II?,m- v Fa, 


? otherwise. 


We now restrict LTL to a fragment without explicit T and L symbols and 
with the explicit F operator that we add to the syntax. We provide an alternative 
3-valued semantics for this fragment, denoted by 14 (6,1) where i € Nso indicates 
a position in or outside the trace. We assume the order | <? < T, and extend the 
Boolean operations to the 3-valued domain with the rules ^31 = L, ng L = T 
and ^3? =? and $1 V3 $9 = maxz(¢1, $2). We define the semantics inductively as 
follows: 
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if i < |r| and p E mi, 


As (p, i) = 41 else ifi<|rlandp¢ mi, 
? otherwise, 


Ia (70, i) = ^spis (0,1), 
mac Ved i) = Ln (1, i) Va Ian (ba, i), 
H (X, i) = url, i+ 1), 

( 


Is (6,1) V3 Ua (XF o, 1) ifi < [n], 


T 


ps (F 0, i) ~ Iis (o, i) if i > m, 
vl i629] Vu USA T TTE 
Ix (1 U Qa, 1) = tn (do, i) if i > [n]. 


We note that the adapted semantics allows evaluating a finite trace in polynomial 
time, in contrast to [r E-3 ¢], which requires a PSPACE-complete algorithm. 
This improvement in complexity comes at a price — the adapted semantics cannot 
semantically characterize tautologies and contradiction. We have for example 
that uz(pV 7p, 1) for the empty word evaluates to ?, despite the fact that p V ^p 
is semantically equivalent to T. The novel semantics that we introduce in the 
following sections make the same tradeoff. 
In the following lemma, we relate the two three-valued semantics. 


Lemma 3. Given an LTL formula and a trace x € II*, |r| #0, we have that 


ps (0,1) — T > [m Fs 9] — T, 
Ux (0,1) = L > [m E; 4] = L. 


Proof. These two statements can be proven by induction on the structure of the 
LTL formula (see Appendix A.1 in [2]). [t H3 d] = ? > uz(¢,1) = ? is the 
consequence of the first two. 


4 Counting Finitary Semantics for LTL 


In this section, we introduce the counting semantics for LTL. We first provide 
necessary definitions in Sect. 4.1, we present the new semantics in Sect. 4.2 and 
finally propose a predictive mapping that transforms the counting semantics into 
a qualitative 5-valued verdict in Sect. 4.3. 


4.1 Definitions 


Let N4} = No U foo, —} be the set of natural numbers (incl. 0) extended with the 
two special symbols oo (infinite) and — (impossible) such that Vn € No, we define 
n < oo < —. We define the addition Q of two elements a,b € N., as follows. 
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Definition 4 (Operator ©). We define the binary operator 6 : NLxN4 — Ny 
s. t. for ab with a,b E N} we have a+b if a,b € No and max{a, b} otherwise. 


We denote by (s, f) a pair of two extended numbers s, f € N4. In Definition 5, 
we introduce several operations on pairs: (1) the swap between the two values 
(~), (2) the increment by 1 of both values (@1), (3) the minmax binary operation 
(LI) that gives the pair consisting of the minimum first value and the maximum 
second value, and (4) the mazmin binary operation (M) that is symmetric to (U). 

Definition 7 introduces the counting semantics for LTL that for a finite trace 
7 and LTL formula ¢ gives a pair (s, f) € N4 x N+. We call s and f satisfaction 
and violation witness counts, respectively. Intuitively, the s (f) value denotes the 
minimal number of additional steps that is needed to witness the satisfaction 
(violation) of the formula. The value oo is used to denote that the property can 
be satisfied (violated) only in an infinite number of steps, while — means the 
property cannot be satisfied (violated) by any continuation of the trace. 


Definition 5 (Operations ~, 61, LI, N). Given two pairs (s, f) € Ny x Ny 
and (s', f") € N4 x Ny, we have: 


~ (s, f) = (fs), 
(s, f) 81— (se 1, f 61), 
(s, f) U (s', f^) = (min(s, s), max(ff, f^), 
(s, f) 1 (s', f^) = (max(s, s'), min(f, f")) 
Example 6. Given the pairs (0,0), (oo, 1) and (7, —) we have the following: 


p (0,0) F (0,0), dx (oo, 1) = (1,00), 

(0, 0) i (1,1), (co, 1) 1= (co, 2), 

(0, 0) (oo, 1) A (0, 1), (oo, 1) (7, =) = (T=); 
(0,0) (oo, 1) = (oo, 0), (oo, 1) (7, =) = (oo; 1) 


Remark. Note that N+ x N+ forms a lattice where (s, f) < (s', f’) when s > s' 
and f < f’ with join LI and meet M. Intuitively, larger values are closer to true. 


4.2 Semantics 


We now present our finitary semantics. 


Definition 7 (Counting finitary semantics). Let a € II* be a finite trace, 
i € N>o be a position in or outside the trace and ¢ € © be an LTL formula. We 
define the counting finitary semantics of LTL as the function 

dr : ®x II* x No ~ Ny x Ny such that: 
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(001 wfi<|r|Apen, 
d; (p, i) =4(-,0) ifi< |r| ^p mi, 
(0,0) ifi |r, 
d, (^6, i) = d. (à, i), 
da ($1 V $2,1) = da ($1, 1) U de ($2, 1), 
d (X à, i) =d,(¢,i+1) 41, 
Lavy = LEDUC INK GUY) FES Ie 
dabo d, (9i) U (de(,4) T (7. oc) ifi» nl, 
\ fda (da) Ude(XF 4,8) fi |r, 
T9 Siada — dil 


We now provide some motivations behind the above definitions. 


Proposition. À proposition is either evaluated before or after the end of the 
trace. If it is evaluated before the end of the trace and the proposition holds, 
the satisfaction and violations witness counts are trivially 0 and —, respec- 
tively. In the case that the proposition does not hold, we have the symmetric 
witness counts. Finally, we take an optimistic view in case of evaluating a 
proposition after the end of the trace: The trace can be extended to a trace 
with i steps s.t. either p holds or p does not hold. 

Negation. Negating a formula simply swaps the witness counts. If we witness 
the satisfaction of ¢ in n steps, we witness the violation of ~o in n steps, and 
vice versa. 

Disjunction. We take the shorter satisfaction witness count, because the satis- 
faction of one subformula is enough to satisfy the property. And we take the 
longer violation witness count, because both subformulas need to be violated 
to violate the property. 

Next. The next operator naturally increases the witness counts by one step. 

Eventually. We use the rewriting rule Fø = $ V XF @¢ to define the semantics 
of the eventually operator. When evaluating the formula after the end of 
the trace, we replace the remaining obligation (XF $) by (—,0o). Thus, F 
evaluated on the empty word is satisfied by a suffix that satisfies ¢, and it is 
violated only by infinite suffixes. 

Until. We use the same principle for defining the until semantics that we used for 
the eventually operator. We use the rewriting rule 9U v =  V(GAX($Uqw)). 
On the empty word, ¢ U w is satisfied (in the shortest way) by a suffix that 
satisfies w, and it is violated by a suffix that violates both ¢ and w. 


Example 8. We refer to our motivating example from Table1 and evaluate the 
trace T2 with respect to the specification v». We present the outcome in Table 2. 
We see that every proposition evaluates to (0,—) when true. The satisfaction 
of a proposition that holds at time 7 is immediately witnessed and it cannot be 
violated by any suffix. Similarly, a proposition evaluates to (—,0) when false. 
The valuations of F g count the number of steps to positions in which g holds. 
For instance, the first time at which g holds is i = 3, hence Fg evaluates to 
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(2, —) at time 1, (1, —) at time 2 and (0, —) at time 3. We also note that Fg 
evaluates to (0,00) at the end of the trace — it could be immediately satisfied 
with the continuation of the trace with g that holds, but could be violated only 
by an infinite suffix in which g never holds. We finally observe that G(r — F g) 
evaluates to (00,00) at all positions — the property can be both satisfied and 
violated only with infinite suffixes. 


Table 2. Unbounded response property example: dz (6, i) with the trace 7 = 72. 


1 2 3 4 5 6 T EOT 
T — — T — — T 
g — — J — — T — 
4.) ©, =) | 9 |.) O=) | (59 | G9 | @-) | 0) 
d«(g,i)| (5,0) | (5,0) | (0,—) (59) | (5,09 | (6.2) | (5,0) | (0,0) 
d«(^r,i)| (—,0) | (0,—) | (0L) | (5,0 | (0,—) | (0L. —) | (5,0) | (0,0) 
d«(Fg,i)| (2,—) | (5) | (6 | (2,—) | (5, ) | (6, —) | (1,09) | (0,00) 
d«(r — Fg,i)| (2,—) | (0L —) | (0L.—) | (2, —) | (0, —) | (6, —) | (1,09) | (0,00) 
ds (G(r — F g), i) | (20, 00) | (00, 00) | (00, oo) | (00, oo) | (00, 00) | (00, 00) | (00, co) | (00, oo) 


We use “—” instead of “L” in the traces r and g to improve the readability. 


Not all pairs (s, f) € N} x N} are possible according to the counting seman- 
tics. We present the possible pairs in Lemma 9. 


Lemma 9. Let « € II* be a finite trace, 6 an LTL formula and i € No an indez. 
We have that dr(ġ,i) is of the form (a, —), (—,a), (b1, 62), (51, 0c), (00, b2) or 
(co, 00), where a € |r| — i and b; > |r| ^ i for j € {1,2}. 

Proof. The proof can be obtained using structural induction on the LTL formula 
(see Appendix A.2 in [2]). 


Finally, we relate our counting semantics to the three valued semantics in 
Lemma 10. 


Lemma 10. Given an LTL formula and a trace x € II* where i € No is an 
index and ¢ is an LTL formula, we have that 


d; (6, i) = (a, —) X Ax (0, 1) = 3s 
and Ax « a.m' = mi Tipi. Tita; Ua lQ, 1) = 
d; (6, i) = (=5 4) Lad Aa (0, 1) = SES 
and Ax < a.m = Tit Tiplcee Tita; Unt (0,1) = 1 


d; (6, i) = (b1, b2) ut Ua (0, 1) = is 
where a < |r| — i and b; is either oo or b; > |n| — i for j € (1,2). 


Intuitively, Lemma 10 holds because we only introduce the symbol “—” within 
the trace when a satisfaction (violation) is observed. And the values of a pair 
only propagate into the past (and never into the future). 
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4.3 Evaluation 


We now propose a mapping that predicts a qualitative verdict from our counting 
semantics. We adopt a 5-valued set consisting of true (T), presumably true (Tp), 
inconclusive (?), presumably false (Lp) and false (L) verdicts. We define the 
following order over these five values: | < Lp « ? < Tp < T. We equip this 
5-valued domain with the negation (~) and disjunction (V) operations, letting 

oT = 1l,AaTp=tp, 7? =?, alp = Ip; œL =T and Qi V ¢2 = max{ 1, $»]. 
We define other Boolean operators such as conjunction by the usual logical 
equivalences ($41 ^ ¢2 = 7(7¢1 V ^43), etc.). 

We evaluate a property on a trace to T (L) when the satisfaction (violation) 
can be fully determined from the trace, following the definition of the three- 
valued semantics p. Intuitively, this takes care of the case in which the safety 
(co-safety) part of a formula has been violated (satisfied), at least for properties 
that are intentionally safe (intentionally co-safe, resp.) [10]. 

Whenever the truth value is not determined, we distinguish whether d (4, i) 
indicates the possibility for a satisfaction, respective violation, in finite time or 
not. For possible satisfactions, respective violations, in finite time we make a 
prediction on whether past observations support the believe that the trace is 
going to satisfy or violate the property. If the predictions are not inconclusive 
and not contradicting, then we evaluate the trace to the (presumable) truth 
value Tp orl p. If we cannot make a prediction to a truth value, we compute 
the truth value recursively based on the operator in the formula and the truth 
values of the subformulas (with temporal operators unrolled). 

We use the predicate pred, to give the prediction based on the observed 
witnesses for satisfaction. The predicate pred, (9, i) becomes ? when no witness 
for satisfaction exists in the past. When there exists a witness that requires at 
least the same amount of additional steps as the trace under evaluation then the 
predicate evaluates to T. If all the existing witnesses (and at least one exists) 
are shorter than the current trace, then the predicate evaluates to L. For a 
prediction on the violation we make a prediction on the satisfaction of d; (^9, i), 
i.e., we compute pred, (6, i). 


Definition 11 (Prediction predicate). Let s, f denote natural numbers and 


let s, (o, i), f. (0,1) € N4 such that d, (0,1) = (s7(¢, 4), fa (0, 1)). We define the 
3-valued predicate pred, as 


if 3j « i. d«(Q, j) = (s', —) and s,(,1) € s', 
? df Aj«i.d.(6,3) = (5,—), 

if 3j « i.d«(9, j) = (s', —) and, 

$5 (0, i) > maxo<j<i{s’ | d«(6.3) = (5. —)). 


pred, (9,1) = 


For the evaluation we consider a case split among the possible combinations 
of values in the pairs. 
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Definition 12 (Predictive evaluation). We define the predictive evaluation 
function e;(¢,1), with a < |r| — i and b; > |r| — i for j € {1,2} and a,b; € No, 
for the different cases of d. (d, i): 


d; (9, i) Er (9, i) 
(a, =] T 
if pred, (o, i) = pred, (76, i) T. 
(bi, b2) if pred, (Q, i) = pred, (7, i) Tz(Q, ) 
if pred, (ó, i) < pred, (7, i) +P 
if pred, ($, i) = T Tp 
(b1,00) if pred, (0,1) =? rx(¢, i) 
if pred, (9, i) = L lp 
(00, b1) ex (70, i) 
(oo, oc) ra (6, 1) 
(=; a) L 


where r„(¢,i) is an auxiliary function defined inductively as follows: 


rz (pi) =? 

ra (^9, i) = Sex (¢, i) 
rar V b2,4) = en (1,4) V ex (do, i) 

r«(X" $, i) = ex(ó,i-- n) 

r (Fo i) = £m PD V TXF, S) ifi < Im 
(F ġ,i) ee ifi |r| 

U _ J en (02.1) V (en(d2,4) A en(X(b1 U $2), 4) ifi < [m] 

(¢1 U $2, 1) ese if i |x| 


The predictive evaluation function is symmetric. Hence, e,(¢, i) = ^e4(^9, i) 
holds. 


Example 13. The outcome of evaluating 72 from Table1 is shown in Table3. 
Subformula r — F g is predicted to be Tp at i = 7 because there exists a longer 
witness for satisfaction in the past (e.g., at à — 1). Thus, the trace evaluates to 
Tp, as expected. 


In Fig. 1 we visualize the evaluation of a pair d,(¢,7) = (s, f) for a fixed ¢ 
and a fixed position i. On the x-axis is the witness count s for a satisfaction and 
on the y-axis is the witness count f for a violation. For a value s, respectively 
f, that is smaller than the length of the suffix starting at position į (with the 
other value of the pair always being —), the evaluation is either T or L. Oth- 
erwise the evaluation depends on the values Smar and fmax. These two values 
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Table 3. Unbounded response property example with 7 = T2. 


1 2 3 4 5 6 7 EOT 
g — m EE = — T = 
d«(r,i)| (0,—) | (5,0) | (5,9) | (0,—) | (5,09 | (5,0) | (0,—) | (0,0) 
es(r,i) T L iE T T ? 
d«(g,i)| (—,0) | (5,0) | (0,—) | (5,9 | (25,09 | (0,—) | (5,0) | (0,0) 
ex(g, i) Wr T L ? 
dx (F g,4)} (2,—) | (1,—) | 62 | (55 | (5) | (0, (1,00) | (0, 00) 
ex (F g, 2) P Tp 
d«(r 5 Fg,i)| (2,—) | (0. —) | (0,—) | (2,-) | (0, —) | (0, (1,00) | (0, 00) 
ex(r — Fg, i) P P 
d; (G(r — F g), i) | (co, oo) | (co, oo) | (co, oo) | (00, oo) | (00, 00) | (00, 00) | (00, 00) | (00, 00) 
ex(G(r —DFg),i)| Tp [p [p [p Ip Ip P P 
We use “—” instead of “L” in the traces r and g to improve the readability. 


represent the largest witness counts for a satisfaction and a violation in the past, 
i.e., for positions smaller than i in the trace. Based on the prediction function 
pred,,(¢, i) the evaluation becomes T p, ? or Lp, where ? indicates that the aux- 
iliary function r,(¢,7) has to be applied. Starting at an arbitrary point in the 
diagram and moving to the right increases the witness count for a satisfaction 
while the witness count for a violation remains constant. Thus, moving to the 
right makes the pair “more false". The same holds when keeping the witness 
count for a satisfaction constant and moving up in the diagram as this decrease 
the witness count for a violation. Analogously, moving down and/or left makes 
the pair *more true" as the witness count for a violation gets larger and/or the 
witness count for a satisfaction gets smaller. 
Our 5-valued predictive evaluation refines the 3-valued LTL semantics. 


Theorem 14. Let o be an LTL formula, n € IT* and i € Noo. We have 


HrlQ, i) = T e ex(0, i) =T, 
In, 1) =Le ex(¢, i) =, 
Aa (0, 1) =? eo ex (6, 2) € LIE 


Theorem 14 holds, because the evaluation to T and L is simply the mapping 
of a pair that contains the symbol *—", which we have shown in Lemma 10. 

Remember that N+ x N4 is partially ordered by <. We now show that having 
a trace that is ^more true" than another is correctly reflected in our finitary 
semantics. To define “more true", we first need the polarity of a proposition in 
an LTL formula. 


Example 15. Note that g has positive polarity in ¢ = G(r — F g). If we define 
75 to be as T2, except that g € 75(i) for i € {1,...,6}, we have en (ġ,i) = Lp, 
whereas en (¢ġ,i) = Tp. 
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(0,0) |x| — à bi d TO= g 


Fig. 1. Lattice for (s, f) with ¢ and i < |r] fixed. 


Definition 16 (Polarity). Let #- be the number of negation operators on a 
specific path in the parse tree of ó starting at the root. We define the polarity as 
the function pol(p) with proposition p in an LTL formula ¢ as follows: 


pos, if #7 on all paths to a leaf with proposition p is even, 
pol(p) — 4 neg, if #> on all paths to a leaf with proposition p is odd, 


mixed, otherwise. 


With the polarity defined, we now define the constraints for a trace to be 
“more true” with respect to an LTL formula 9. 


Definition 17 (a Cy 1’). Given two traces n and x’ of equal length and an 
LTL formula à over proposition p, we define that n Cg a’ iff 


ViVp. pol(p) = mized > p € T; > p € v; and 
pol(p) = pos > p € Ti > p € T; and 
pol(p) = neg = p E€ Ti —p€m;. 

Whenever one trace is “more true” than another, this is correctly reflected 
in our finitary semantics. 


Theorem 18. For two traces x and x’ of equal length and an LTL formula $ 
over proposition p, we have that 


T L6 T => dx (Q, 1) 3 d; (6, 1). 


Therefore, we have for n Cy T that 


ez (9, 1) =T> ext (Q, 1) = T, and 
ex (6, 1) = en (à, 1) = L. 
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Theorem 18 holds, because we have that replacing an arbitrary observed value 
in z by one with positive polarity in 7’ always results with d,(@,1) = (s, f) and 
dw(ġ,1) = (s', f’) in s' € s and f' > f, as with 7 Cy 7’ we have that 7’ 
witnesses a satisfaction of ¢ not later than m and 7’ also witness a violation of 
¢ not earlier than 7. 


Table 4. Making a system “more true". 


$ m |dr(¢ġ,1)|er(ġ,1) $ T d. (ó, 1)| ez (, 1) 
Latest lee 

p^XFp|- — Gu ip GFp ESTE T 
5» |T D Tp pv xG] | cap | Te 
=o | oe 


In Table 4 we give examples to illustrate the transition of one evaluation 
to another one. Note that it is possible to change from Tp to Lp. However, 
this is only the predicated truth value that becomes “worse”, because we have 
strengthened the prefix on which the prediction is based on, the values of d,(¢, i) 
do not change and remain the same is such a case. 


5 Examples 
We demonstrate the strengths and weaknesses of our approach on the exam- 


ples of LTL specifications and traces shown in Table 5. We fully develop these 
examples in Appendix B in [2]. 


Table 5. Examples of LTL specifications and traces 


Specifications Traces 

yy = FXg T1: g: 75: T: 
we = GXg 72: g: g: 
v3 = G(r—Fg) T3: T: 76: g: 
(i = Aena Gli Fg] g: m: g: 
ps = G((Xr)U(XXg)) Ta: 1 Tg: r: 
we = FGgVFGog gi: g: 
v; = G(Frv Fg) T2 

Vs = GF(rV g) ga: 

wo = GFrVGFg 
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'Table 6 summarizes the evaluation of our examples. The first and the second 
column denote the evaluated specification and trace. We use these examples to 
compare LTL with counting semantics (c-LTL) presented in this paper, to the 
other two popular finitary LTL interpretations, the 3-valued LTL semantics [4] 
(3-LTL) and LTL on trucated paths [9] (t-LTL). We recall that in t-LTL there 
is a distinction between a weak and a strong next operator. We denote by t- 
LTL-s (t-LTL-w) the specifications from our examples in which X is interpreted 
as the strong (weak) next operator and assume that we always give a strong 
interpretation to U and F and a weak interpretation to G. 


Table 6. Comparison of different verdicts with different semantics 


Spec. | Trace |c-LTL |3-LTL |t-LTL-s |t-LTL-w Spec. | Trace |c-LTL |3-LTL |t-LTL-s|t-LTL-w 
pı mı | Le ? alt T We tT | lp ? 
p2 | m2 | Te T L T Ve | "77 | Te 4 
Vs | ma | Le | ? V; | ms | le | ? 
pa | Ta | Tp ? Vs | Tts | Lp F 
Vs | ms | Te T all. T Vo | ms | Tp ee 


There are two immediate observations that we can make regarding the results 
presented in Table 6. First, the 3-valued LTL gives for all the examples an incon- 
clusive verdict, a feedback that after all has little value to a verification engineer. 
The second observation is that the verdicts from c-LTL and t-LTL can differ quite 
a lot, which is not very surprising given the different strategies to interpret the 
unseen future. We now further comment on these examples, explaining in more 
details the results and highlighting the intuitive outcomes of c-LTL for a large 
class of interesting LTL specifications. 


Effect of Nested Next. We evaluate with v, and wv» the effect of nesting X in 
an F and an G formula, respectively. We make a prediction on Xg at the end 
of the trace before evaluating F and G. As a consequence, we find that (Y1, 71) 
evaluates to presumably false, while (v5, 72) evaluates to presumably true. In t- 
LTL, this class of specification is very sensitive to the weak/strong interpretation 
of next, as we can see from the verdicts. 


Request/Grants. We evaluate the request /grant property w3 from the motivating 
example on the trace 73. We observe that r at cycle 2 is followed by g at cycle 
3, while r at cycle 5 is not followed by g at cycle 6. Hence, (1/3, 73) evaluates to 
presumably false. 


Concurrent Request/Grants. We evaluate the specification w4 against the trace 
714. In this example rı is triggered at even time stamps and r» is triggered at odd 
time stamps. Every request is granted in one cycle. It follows that regardless of 
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the time when the trace ends, there is one request that is not granted yet. We 
note that %4 is a conjunction of two basic request /grant properties and we make 
independent predictions for each conjunct. Every basic request/grant property 
is evaluated to presumably true, hence (4/4, 74) evaluates to presumably true. At 
this point, we note that in t-LTL, every request that is not granted by the end of 
the trace results in the property violation, regardless of the past observations. 


Until. We use the specification vs and the trace 75 to evaluate the effect of U on 
the predictions. The specification requires that X r continuously holds until X X g 
becomes true. We can see that in 75 Xr is witnessed at cycles 1 — 4, while XX g 
is witnessed at cycle 5. We can also see that Xr is again witnessed from cycle 6 
until the end of the trace at cycle 8. As a consequence, (1/5, 75) is evaluated to 
presumably true. 


Stabilization. The specification Yẹ says that the value of g has to eventually 
stabilize to either true or false. We evaluate the formula on two traces mę and 
77. In the trace 76, g alternates between true and false every two cycles and 
becomes true in the last cycle. Hence, there is no sufficiently long witness of 
trace stabilization (We, me) evaluates to presumably false. In the trace 77, g also 
alternates between true and false every two cycles, but in the last four cycles g 
remains continuously true. As a consequence, (Ye, 77) evaluates to presumably 
true. This example also illustrates the importance of when the trace truncation 
occurs. If both re and 7; were truncated at cycle 5, both (We, me) and (vs, 77) 
would evaluate to presumably false. We note that wg is satisfied by all traces in 
t-LTL. 


Sub-formula Domination. The specification y; exposes a weakness of our app- 
roach. It requires that in every cycle, either r or g is witnessed in some unbounded 
future. With our approach, (1/7, mg) evaluates to presumably false. This is against 
our intuition because we have observed that g becomes regularly true very sec- 
ond time step. However, in this example our prediction for Fr dominates over 
the prediction for F g, leading to the unexpected presumably false verdict. On the 
other hand, t-LTL interpretation of the same specification is dependent only on 
the last value of r and g. 


Semantically Equivalent Formulas. We now demonstrate that our approach may 
give different answers for semantically equivalent formulas. For instance, both 
weg and wo are semantically equivalent to w7. We have that (ws, ms) evaluates to 
presumably false, while (w9, 78) evaluates to presumably true. We note that t-LTL 
verdicts are stable for semantically different formulas. 


6 Conclusion 


We have presented a novel finitary semantics for LTL that uses the history 
of satisfaction and violation in a finite trace to predict whether the co-safety 
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and safety aspects of a formula will be satisfied in the extension of the trace 
to an infinite one. We claim that the semantics closely follow human intuition 
when predicting the truth value of a trace. The presented examples (incl. non- 
monitorable LTL properties) illustrate our approach and support this claim. 
Our definition of the semantics is trace-based, but it is easily extended to take 
an entire database of traces into account, which may make the approach more 
precise. Our approach currently uses a very simple form of learning to predict 
the future. We would like to consider more sophisticated statistical methods to 
make better predictions. In particular, we plan to apply nonparametric statisti- 
cal methods (i.e., the Wilcoxon signed-rank test [16]), in combination with our 
counting semantics, to identify and quantify the traces that are outliers. 
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Abstract. We present Rabinizer 4, a tool set for translating formulae of 
linear temporal logic to different types of deterministic w-automata. The 
tool set implements and optimizes several recent constructions, includ- 
ing the first implementation translating the frequency extension of LTL. 
Further, we provide a distribution of PRISM that links Rabinizer and 
offers model checking procedures for probabilistic systems that are not 
in the official PRISM distribution. Finally, we evaluate the performance 
and in cases with any previous implementations we show enhancements 
both in terms of the size of the automata and the computational time, 
due to algorithmic as well as implementation improvements. 


1 Introduction 


Automata-theoretic approach [VW806] is a key technique for verification and 
synthesis of systems with linear-time specifications, such as formulae of linear 
temporal logic (LTL) [Pnu77]. It proceeds in two steps: first, the formula is 
translated into a corresponding automaton; second, the product of the system 
and the automaton is further analyzed. The size of the automaton is important 
as it directly affects the size of the product and thus largely also the analysis 
time, particularly for deterministic automata and probabilistic model checking 
in a very direct proportion. For verification of non-deterministic systems, mostly 
non-deterministic Büchi automata (NBA) are used [EH00,SB00, GO01, GL02, 
BKRS12, DLLF+16] since they are typically very small and easy to produce. 


Probabilistic LTL model checking cannot profit directly from NBA. Even 
the qualitative question, whether a formula holds with probability 0 or 1, requires 
automata with at least a restricted form of determinism. The prime example are 
the limit-deterministic (also called semi-deterministic) Büchi automata (LDBA) 
[CY88] and the generalized LDBA (LDGBA). However, for the general quanti- 
tative questions, where the probability of satisfaction is computed, general limit- 
determinism is not sufficient. Instead, deterministic Rabin automata (DRA) have 


This work has been partially supported by the Czech Science Foundation grant No. 
P202/12/G061 and the German Research Foundation (DFG) project KR 4890/1-1 
“Verified Model Checkers” (317422601). A part of the frequency extension has been 
implemented within Google Summer of Code 2016. 
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H. Chockler and G. Weissenbacher (Eds.): CAV 2018, LNCS 10981, pp. 567-577, 2018. 
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[KE12] 
DGRA ————— —9 DRA 
EK14 , 
[ ] [Safes]. [KMWW17| 
[VW86] etc. " dd [Pit06,Sch09] 
LTL > NBA > DPA 
7. [CY88] p 
"ys (EKRS17] 
[SEJK 16] LDGBA mM LDBA 


Fig. 1. LTL translations to different types of automata. Translations implemented in 
Rabinizer 4 are indicated with a solid line. The traditional approaches are depicted as 
dotted arrows. The determinization of NBA to DRA is implemented in ltl2dstar [Kle], 
to LDBA in Seminator [BDK+17] and to (mostly) DPA in spot [DLLF+16]. 


been mostly used [KNP11] and recently also deterministic generalized Rabin 
automata (DGRA) [CGK13]. In principle, all standard types of deterministic 
automata are applicable here except for deterministic Büchi automata (DBA), 
which are not as expressive as LTL. However, other types of automata, such 
as deterministic Muller and deterministic parity automata (DPA) are typically 
larger than DGRA in terms of acceptance condition or the state space, respec- 
tively.! Recently, several approaches with specific LDBA were proved applica- 
ble to the quantitative setting [HLS+15,SEJK16] and competitive with DGRA. 
Besides, model checking MDP against LTL properties involving frequency oper- 
ators [BDL12] also allows for an automata-theoretic approach, via deterministic 
generalized Rabin mean-payoff automata (DGRMA) [FKK15]. 


LTL synthesis can also be solved using the automata-theoretic approach. 
Although DRA and DGRA transformed into games can be used here, the 
algorithms for the resulting Rabin games [PP06]| are not very efficient in 
practice. In contrast, DPA may be larger, but in this setting they are the 
automata of choice due to the good practical performance of parity-game solvers 
[FL09, ML16, JBB 4-17]. 


Types of Translations. The translations of LTL to NBA, e.g., [VW86], are 
typically “semantic” in the sense that each state is given by a set of logical formu- 
lae and the language of the state can be captured in terms of semantics of these 
formulae. In contrast, the determinization of Safra [Saf88] or its improvements 
Pit06,Sch09, TD14, FL15] are not “semantic” in the sense that they ignore the 
structure and produce trees as the new states that, however, lack the logical inter- 
pretation. As a result, if we apply Safra's determinization on semantically created 
NBA, we obtain DRA that lack the structure and, moreover, are unnecessarily 
large since the construction cannot utilize the original structure. In contrast, the 


1 Note that every DGRA can be written as a Muller automaton on the same state 
space with an exponentially-sized acceptance condition, and DPA are a special case 
of DRA and thus DGRA. 
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recent works [KE12, KLG13, EK14, KVI5, SEJK16, EKRS17, MS17,KV17] pro- 
vide “semantic” constructions, often producing smaller automata. Further- 
more, various transformations such as degeneralization [KE12], index appearance 
record [KMWW 17] or determinization of limit-deterministic automata [EKRS17] 
preserve the semantic description, allowing for further optimizations of the 
resulting automata. 


Our Contribution. While all previous versions of Rabinizer [GKE12, KLG13, 
KK14] featured only the translation LTL—2DGRA-DRA, Rabinizer 4 now 
implements all the translations depicted by the solid arrows in Fig. 1. It improves 
all these translations, both algorithmically and implementation-wise, and more- 
over, features the first implementation of the translation of a frequency extension 
of LTL [FKK15]. 

Further, in order to utilize the resulting automata for verification, we provide 
our own distribution? of the PRISM model checker [KNP11], which allows for 
model checking MDP against LTL using not only DRA and DGRA, but also 
using LDBA and against frequency LTL using DGRMA. Finally, the tool can 
turn the produced DPA into parity games between the players with input and 
output variables. Therefore, when linked to parity-game solvers, Rabinizer 4 can 
be also used for LTL synthesis. 

Rabinizer 4 is freely available at http:/ /rabinizer.model.in.tum.de together 
with an on-line demo, visualization, usage instructions and examples. 


2 Functionality 


We recall that the previous version Rabinizer 3 has the following functionality: 


— It translates LTL formulae into equivalent DGRA or DRA. 
— It is linked to PRISM, allowing for probabilistic verification using DGRA 
(previously PRISM could only use DRA). 


2.1 Translations 


Rabinizer 4 inputs formulae of LTL and outputs automata in the standard HOA 
format [BBD+15], which is used, e.g., as the input format in PRISM. Automata 
in the HOA format can be directly visualized, displaying the *semantic" descrip- 
tion of the states. Rabinizer 4 features the following command-line tools for the 
respective translations depicted as the solid arrows in Fig. 1: 


Itl2dgra and Itl2dra correspond to the original functionality of Rabinizer 3, 
i.e., they translate LTL (now with the extended syntax, including all common 
temporal operators) to DGRA and DRA [EK 14], respectively. 


? Merging these features into the public release of PRISM as well as linking the new 
version of Rabinizer is subject to current collaboration with the authors of PRISM. 
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It121dgba and 1tl2ldba translate LTL to LDGBA using the construction of 
[SEJK16] and to LDBA, respectively. The latter is our modification of the 
former, which produces smaller automata than chaining the former with the 
standard degeneralization. 

It12dpa translates LTL to DPA using two modes: 

— 'The default mode uses the translation to LDBA, followed by a LDBA- 
to-DPA determinization [EKRS17] specially tailored to LDBA with the 
"semantic" labelling of states, avoiding additional exponential blow-up of 
the resulting automaton. 

— The alternative mode uses the translation to DRA, followed by our 
improvement of the index appearance record of [KMWW17]. 

flti2dgrma translates the frequency extension of LTL\ av, i.e. LTL\ eu [KLG13] 
with G^^ operator?, to DGRMA using the construction of [FKK15]. 


2.2 Verification and Synthesis 


The resulting automata can be used for model checking probabilistic systems 
and for LTL synthesis. To this end, we provide our own distribution of the prob- 
abilistic model checker PRISM as well as a procedure transforming automata 
into games to be solved. 


Model checking: PRISM distribution. For model checking Markov chains 
and Markov decision processes, PRISM [KNP11] uses DRA and recently 
also more efficient DGRA [CGK13,KK14]. Our distribution, which links 
Rabinizer, additionally features model checking using the LDBA [SEJK 16, 
SK16] that are created by our 1t121dba. 

Further, the distribution provides an implementation of frequency LTL\gu 
model checking, using DGRMA. To the best of our knowledge, there are no 
other implemented procedures for logics with frequency. Here, techniques of 
linear programming for multi-dimensional mean-payoff satisfaction [CKK15] 
and the model-checking procedure of [FKK15] are implemented and applied. 

Synthesis: Games. The automata-theoretic approach to LTL synthesis requires 
to transform the LTL formula into a game of the input and output players. 
We provide this transformer and thus an end-to-end LTL synthesis solution, 
provided a respective game solver is linked. Since current solutions to Rabin 
games are not very efficient we implemented a transformation of DPA into 
parity games and a serialization to the format of PG Solver [FL09]. Due to 
the explicit serialization, we foresee the main use in quick prototyping. 


3 The frequential globally construct [BDL12,BMM14] Gy with ~ € 
{>,>,<,<},o € [0,1] intuitively means that the fraction of positions satisfy- 
ing ọ satisfies ~p. Formally, the fraction on an infinite run is defined using the 
long-run average [BMM14]. 
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3 Optimizations, Implementation, and Evaluation 


Compared to the theoretical constructions and previous implementations, there 
are numerous improvements, heuristics, and engineering enhancements. We eval- 
uate the improvements both in terms of the size of the resulting automaton as 
well as the running time. When comparing with respect to the original Rabinizer 
functionality, we compare our implementation ltl2dgra to the previous version 
Rabinizer 3.1, which is already a significantly faster [EKS16] re-implementation 
of the official release Rabinizer 3 [KK14]. All of the benchmarks have been exe- 
cuted on a host with i7-4700MQ CPU (4x2.4 GHz), running Linux 4.9.0-5-amd64 
and the Oracle JRE 9.0.4+11 JVM. Due to the start-up time of JVM, all times 
below 2s are denoted by <2 and not specified more precisely. All experiments 
were given a time-out of 900s and mem-out of 4GB, denoted by —. 


Algorithmic improvements and heuristics for each of the translations: 


Itl2dgra and ltl2dra. These translations create a master automaton monitoring 
the satisfaction of the given formula and a dedicated slave automaton for 
each subformula of the form Gy [EK14]. We (i) simplify several classes of 
slaves and (ii) “suspend” (in the spirit of [BBDL+13]) some so that they 
appear in the final product only in some states. The effect on the size of 
the state space is illustrated in Table 1 on a nested formula. Further, (iii) 
the acceptance condition is considered separately for each strongly connected 
component (SCC) and then combined. On a concrete example of Table 2, 
the automaton for i = 8 has 31 atomic propositions, whereas the number of 
atomic propositions relevant in each component of the master automaton is 
constant, which we utilize and thus improve performance on this family both 
in terms of size and time. 

ltl2ldba. This translation is based on breakpoints for subformulae of the form 

Gy. We provide a heuristic that avoids breakpoints when w is a safety or 
co-safety subformula, see Table 3. 
Besides, we add an option to generate a non-deterministic initial component 
for the LDBA instead of a deterministic one. Although the LDBA is then 
no more suitable for quantitative probabilistic model checking, it still is for 
qualitative model checking. At the same time, it can be much smaller, see 
Table 4 which shows a significant improvement on the particular formula. 

Itl2dpa. Both modes inherit the improvements of the respective ltl2ldba and 
ltl2dgra translations. Further, since complementing DPA is trivial, we can 
run in parallel both the translation of the input formula and of its negation, 
returning the smaller of the two results. Finally, we introduce several heuris- 
tics to optimize the treatment of safety subformulae of the input formula. 

dra2dpa. The index appearance record of [&MWWA17] keeps track of a permu- 
tation (ordering) of Rabin pairs. To do so, all ties between pairs have to be 
resolved. In our implementation, we keep a pre-order instead, where irrelevant 
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ties are not resolved. Consequently, it cannot happen that an irrelevant tie 
is resolved in two different ways like in [KMWW17], thus effectively merging 
such states. 


Table 1. Effect of simplifications and suspension for 1tl2dgra on the formulae «s; = 
Goi where ġı = ai,(i) = (a/U(Xói-1)), and yf = Gd! where of = ar, ó| 
(¢,_,U(X"a;), displaying execution time in seconds/#states. 


V2 |s Va ws we 
Rabinizer 3.1 [EKS16] | «2/4 | «2/16 | «2/73 | 3/332 60/1463 
Iti2dgra «2/3| «2/7 | «2/35 3/199 13/1155 
v2 |43 pa Ys V6 
Rabinizer 3.1 [EKS16] | <2/4| <2/16| 2/104 | 128/670 — 
Itl2dgra <2/3| <2/10| «2/38 7/175 239/1330 


Table 2. Effect of computing acceptance sets per SCC on formulae Yı = zi A qı, 
Ypa = (x1 A d1) V (^z1 ^ $2), Y3 = (x1^z2 A Q1) V (^w1 ^ 22 ^ b2) V (£1 A722 ^ s), ..., 
where ¢; = XG((a;Ub;) V (c;Udi)), displaying execution time in seconds/#acceptance 
sets. 


Vi (V2 | vs pa Vs |... | Us 
Rabinizer 3.1 [EKS16] | «2/2| «2/7 | «2/19. — — — 
Iti2dgra «2/1|«2/1| «2/1 | «2/1 «2/1 «2/1 


Table 3. Effect of break-point elimination for It121dba on safety formulae s(n,m) = 
i4 G(ai V X" bj) and for Itl21dgba on liveness formulae l(n,m) = A7, GE (a; ^ 


X™b;), displaying #states (#Büchi conditions) € 
s(1,3)|s(2,3) (3,3) — 5(4,3) s(1,4)|s(2,4) s(3,4) s(4, 4) 
[SEJK16]|20 (1) |400 (2) 8- 10?(3) 16- 104(4) 48 (1) 2304 (2) 110592 (3) | — 
Itl2ldba |8(1) |64(1) 512 (1) 4096 (1) 16 (1)|256 (1) |4096 (1) 65536 (1) 
(1,1) (2,1) 1(3,1) — (4,1) 1(1,4) (2,4) (3,4) 1(4, 4) 
[SEJK16]|3 (1) |9 (2) 27(3) |81 (4) 10 (1) |100 (2) |10? (3) 10* (4) 
Itl21dgba|3 (1) |5 (2) 9(3 17(4 (3(1) [5 (2)  |9(3) 17 (4) 


Table 4. Effect of non-determinism of the initial component for ltl2ldba on formulae 
f(t) = F(a ^ X'Gb), displaying #states (#Biichi conditions) 


fa)|fQ2)|f(3 |f(4 |f) |f(6 
[SEJK16] 4 (1)| 6 (1) | 10 (1). 18 (1) | 34 (1) | 66 (1) 
Kl21dba 2(1)3(0| 41) 5 (1)| 60| 7 (2) 
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Table 5. Comparison of the average performance with the previous version of 
Rabinizer. The statistics are taken over a set of 200 standard formulae [KMS18] used, 
e.g., in [BKS13, EKS16], run in a batch mode for both tools to eliminate the effect of 
the JVM start-up overhead. 


Tool Avg # states | Avg # acc. sets Avg runtime 
Rabinizer 3.1 [EKS16] | 6.3 6.7 0.23 
Itl2dgra 6.2 4.4 0.12 


Implementation. The main performance bottleneck of the older implementa- 
tions is that explicit data structures for the transition system are not efficient 
for larger alphabets. To this end, Rabinizer 3.1 provided symbolic (BDD) rep- 
resentation of states and edge labels. On the top, Rabinizer 4 represents the 
transition function symbolically, too. 

Besides, there are further engineering improvements on issues such as storing 
the acceptance condition only as a local edge labelling, caching, data-structure 
overheads, SCC-based divide-and-conquer constructions, or the introduction of 
parallelization for batch inputs. 


Average Performance Evaluation. We have already illustrated the improve- 
ments on several hand-crafted families of formulae. In Tables1 and 2 we have 
even seen the respective running-time speed-ups. As the basis for the overall eval- 
uation of the improvements, we use some established datasets from literature, see 
[KMS18], altogether two hundred formulae. The results in Table 5 indicate that 
the performance improved also on average among the more realistic formulae. 


4 Conclusion 


We have presented Rabinizer 4, a tool set to translate LTL to various determin- 
istic automata and to use them in probabilistic model checking and in synthesis. 
'The tool set extends the previous functionality of Rabinizer, improves on previ- 
ous translations, and also gives the very first implementations of frequency LTL 
translation as well as model checking. Finally, the tool set is also more user- 
friendly due to richer input syntax, its connection to PRISM and PG Solver, 
and the on-line version with direct visualization, which can be found at http:// 
rabinizer.model.in.tum.de. 
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Abstract. STRIX is a new tool for reactive LTL synthesis combining 
a direct translation of LTL formulas into deterministic parity automata 
(DPA) and an efficient, multi-threaded explicit state solver for parity 
games. In brief, STRIX (1) decomposes the given formula into simpler 
formulas, (2) translates these on-the-fly into DPAs based on the queries 
of the parity game solver, (3) composes the DPAs into a parity game, and 
at the same time already solves the intermediate games using strategy 
iteration, and (4) finally translates the winning strategy, if it exists, into 
a Mealy machine or an AIGER circuit with optional minimization using 
external tools. We experimentally demonstrate the applicability of our 
approach by a comparison with PARTY, BoSy, and LTLSYNT using the 
SYNTCOMP2017 benchmarks. In these experiments, our prototype can 
compete with BoSv and LTLSYNT with only PARTY performing slightly 
better. In particular, our prototype successfully synthesizes the full and 
unmodified LTL specification of the AMBA protocol for n — 2 masters. 


1 Introduction 


Reactive synthesis refers to the problem of finding for a formal specification of 
an input-output relation, in our case a linear temporal logic (LTL), a match- 
ing implementation [22], e.g. a Mealy machine or an and-inverter-graph (AIG). 
Since the automata-theoretic approach to synthesis involves the construction of 
a potentially double exponentially sized automaton (in the length of the spec- 
ification) [13], most existing tools focus on symbolic and bounded methods in 
order to combat the state-space explosion [5,9, 11, 18]. A beneficial side effect of 
these approaches is that they tend to yield succinct implementations. 

In contrast to these approaches, we present a prototype implementation of 
an LTL synthesis tool which follows the automata theoretic approach using par- 
ity games as an intermediate step. STRIX! uses the LTL-to-DPA translation 
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presented in [10,23] and the multi-threaded explicit-state parity game solver 
presented in [14,20]: First, the given formula is decomposed into much simpler 
requirements, often resulting in a large number of safety and co-safety condi- 
tions and only a few requiring Büchi or parity acceptance conditions, which is 
comparable to the approach of [5,21]. These requirements are then translated 
on-the-fly into automata, keeping the invariant that the parity game solver can 
easily compose the actual parity game. Further, by querying only for states that 
are actually required for deciding the winner, the implementation avoids unnec- 
essary work. 

The parity game solver is based on the strategy iteration of [19] which itera- 
tively improves non-deterministic strategies, i.e. strategies that can allow several 
actions for a given state as long as they all are guaranteed to lead to the specified 
system behaviour. When translating the winning strategy into a Mealy automa- 
ton or an AIG this non-determinism can be used similarly to “don’t cares” when 
minimizing boolean circuits. Strategy iteration offers us two additional advan- 
tages, first, we can directly take advantage of multi-core systems; second, we 
can reuse the winning strategies which have been computed for the intermediate 
arenas. 


Related Work and Experimental Evaluation. From the tools submitted to SYNT- 
COMP2017, LTLSYNT [15] is closest to our approach: it also combines an LTL- 
to-DPA-translation with an explicit-state parity game solver, but it does not 
intertwine the two steps, instead it uses a different approach for the translation 
leading to one monolithic DPA which is then turned in a parity game. In con- 
trast, the two best performing tools from SYNTCOMP2017, BoSv and PARTY, 
use bounded synthesis, by reduction either to SAT, SMT, or safety games. 

In order to give a realistic estimation of how our tool would have faired at 
SYNTCOMP2017 (TLSF/LTL track), we tried to re-create the benchmark envi- 
ronment of SYNTCOMP2017 as close as possible on our hardware: in its current 
state, our tool would have been ranked below PARTY, but before LTLSYNT and 
BoSy. Due to time and resource constraints, we could only do an in-depth com- 
parison with the current version of LTLSYNT; in particular we used the TLSF 
specification of the complete? AMBA protocol for n = 2 as a benchmark. We 
refer to Sect. 3 for details on the benchmarking procedure. 


2 Design and Implementation 


STRIX is implemented in Java and C++. It supports LTL and TLSF [16] (only 
the reduced basic variant) as input languages, while the latter one is preferred, 
since it contains more information about the specification. We describe the main 
steps of the tool in the following paragraphs with examples given in Fig. 1. 


? i.e. no decomposition in masters and clients or structural properties were used. 
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Splitting and Translation. As a preprocessing step the specification is split into 
syntactic (co)safety and (co)Büchi formulas, and one remaining general LTL for- 
mula. These are then translated into the simplest deterministic automaton class 
using the constructions of [10,23]. To speed up the process these automata are 
constructed on-the-fly, i.e., states are created only if requested by later stages. 
Furthermore, since DPAs can be easily complemented, the implementation trans- 
lates the formula and its negation and chooses the faster obtained one. 
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Fig. 1. Synthesis of a simple arbiter with two clients. Here, a winning strategy is already 
obtained on the partial arena: always take any of the non-dashed edges. 


Arena Construction. Here we construct one product automaton and combine 
the various acceptance conditions into a single parity acceptance condition: for 
this, we use the idea underlying the last-appearance-record construction, known 
from the translation of Muller to parity games, to directly obtain a parity game 
again. 


Parity Game Solving. The parity game solver runs in parallel to the arena 
construction on the partially constructed game in order to guide the translation 
process, with the possibility for early termination when a winning strategy for the 
system player is found. It uses strategy iteration that supports non-deterministic 
strategies [19] from which we can benefit in several ways: First, in the translation 
process, the current strategy stays valid when adding nodes to the arena and 
thus can be used as initial strategy when solving the extended arena. Second, the 
non-deterministic strategies allow us to later heuristically select actions of the 
strategy that minimize the generated controller and to identify irrelevant output 
signals (similar to “don’t care"-cells in Karnaugh maps). Finally, the strategy 
iteration can easily take advantage of multi-core architectures [14, 20]. 


Controller Generation and Minimization. From the non-deterministic strategy 
we obtain an incompletely specified Mealy machine and optionally pass it to 
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the external SAT-based minimizer MEMIN [1] for Mealy machines and extract 
a more compact description. 


AIGER Circuit Generation and Minimization. We translate the minimized 
Mealy machine with the tool SPECULOOS? into an AIGER circuit. In parallel, 
we also construct an AIGER circuit out of the non-minimized Mealy machine, 
since this can sometimes result in smaller circuits. The two AIGER circuits are 
then further compressed using ABC [6], and the smaller one is returned. 


3 Experimental Evaluation 


We evaluate STRIX on the TLFS/LTL-track benchmark of the SYNTCOMP2017 
competition, which consists of 177 realizable and 67 unrealizable temporal logic 
synthesis specifications [15]. The experiment was run on a server with an Intel 
E5-2630 v4 clocked at 2.2GHz (boost disabled). To mimic SYNTCOMP2017 we 
imposed a limit of 8 threads for parallelization, a memory limit of 32GB and 
a timeout of one hour for each specification. Every specification for that a tool 
correctly decides realizability within these limits is counted as solved for the 
category Realizability, and every specification for that it can additionally pro- 
duce an AIGER circuit that is successfully verified is counted as solved for the 
category Synthesis. For this we verified the circuits with an additional time 
limit of one hour using the NUXMV model checker [7] with the check 1tlspec 
and check 1tlspec klive routines in parallel. 

We compared STRIX with LTLSYNT in the latest available release (version 2.5) 
at time of writing. This version differs from the one used during SYNTCOMP2017 
as it contains several improvements, but also performs worse in a few cases and 
exhibits erroneous behaviour: for Realizability, it produced one wrong answer, 
and for Synthesis, it failed in 72 cases to produce AIGER circuits due to a 
program error. 

Additionally, we compare our results with the best configuration of the top 
tools competing in SYNTCOMP2017: PARTY (portfolio), LTLSYNT and BoSv 
(spot). Due to the difficulty of recreating the SYNTCOMP2017 hardware setup’, 
we compiled the results for these tools in Table 1 from the SYNTCOMP2017 web- 
page? combining them with our results. 


3 https://github.com/romainbrenguier/Speculoos 

^ syNTCOMP2017 was run on an Intel E3-1271 v3 (4 cores/8 threads) at 3.6 GHz 
with 32 GB of RAM available for the tools. As stated above, we imposed the same 
constraints regarding timeout, maximal number of threads, and memory limit; but 
the Intel E3-1271 v3 runs at 3.6 GHz (with boost 4.0 GHz), while the Intel E5-2630 
v4 used by us runs at only 2.2 GHz (boost disabled) resulting in a lower per-thread- 
performance (potentially 3096 slower); on the other hand our system has a larger 
cache and a theoretically much higher memory bandwidth from up to 68.3 GB/s 
compared to 25.6 GB/s (for random reads, as in the case of dynamically generated 
parity games, these numbers are much closer). It seems therefore likely that for some 
benchmark-tool combinations our system is faster while for others it is slower. 

5 http://syntcomp.cs.uni-saarland.de/syntcomp2017/experiments/ 
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The Quality rating compares the size of the solutions according to the SYNT- 
COMP2017 formula, where a tool gets 2 — log) 2 quality points for each ver- 
ified solution of size n for a specification with reference size r. We now move on 
to a detailed discussion of the results and their interpretation. 


Table 1. Results for STRIX compared with LTLSYNT and selected results from SYNT- 
COMP2017 on the TLSF/LTL-track benchmark and on noteable instances. We mark 
timeouts by TIME, memouts by MEM, and errors by ERR. 


Our system SYNTCOMP2017 
STRIX LTLSYNT (2.5) PARTY LTLSYNT BoSv 
Realizability 214 204 224 195 181 
© Synthesis 197 123 203 182 181 
8 Quality 330 136 308 180 298 
Avg. Quality 1.68 1.10 1.52 0.99 1.64 
& full_arbiter_7 11.34 MEM 8.77 MEM TIME 
O gs prioritized arbiter 7 58.53 TIME 372.95 TIME TIME 
z & round robin arbiter 6 8.45 158.33 TIME 733.92 TIME 
& E 1ltl2dba E 10 6.79 324.84 TIME TIME TIME 
&Á 1tl2dba Q 8 2.13 346.12 TIME TIME TIME 
t9 amba ... encode 12 89 ERR 1040 3251 369 
= full arbiter 5 531 ERR 2257 7393 TIME 
^5 — full arbiter 6 626 ERR 7603 26678 TIME 
d  ltl2dba E 4 7 406 243 406 TIME 
ltl2dba E 6 11 3952 1955 3952 TIME 


Realizability. We were able to correctly decide realizability for 163 and unre- 
alizability for 51 specifications, resulting in 214 solved instances. We solve five 
instances that were previously unsolved in SYNTCOMP2017. 


Synthesis. We produced AIGER circuits for 148 of the realizable specifications. 
In 15 cases, we only constructed a Mealy machine, but the subsequent steps 
(MEMIN for minimization or SPECULOOS for circuit generation) reached the 
time or memory limit. We were able to verify correctness for 146 of the cir- 
cuits, reaching the model checking time limit in two case. Together with the 51 
specifications for which we determined unrealizability, this results in 197 solved 
instances. 


Quality. We produced 36 solutions that are smaller than any solution during 
SYNTCOMP2017. The most significant reductions are for the AMBA encoder 
and the full arbiter, with reductions of over 75%, and for 1t12dba_E_4 and 
1t12dba_E_6, where we produce indeed the smallest implementation there is. 
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3.1 Effects of Minimization 


We could reduce the size of the Mealy machine in 80 cases, and on average 
by 45%. However the data showed that this did not always reduce the size 
of the generated AIGER circuit: in 13 cases (most notably for several arbiter 
specifications) the size of the circuit generated from the Mealy machine actually 
increased when applying minimization (on average by 190%), while it decreased 
in 62 cases (on average by 55%). 

We conjecture that the structure of the product-arena is sometimes amenable 
to compact representation in an AIGER circuit, while after the (SAT-based) 
minimization this is lost. In these cases the SAT/SMT-based bounded synthesis 
tools such as BoSv and PARTY also have difficulties producing a small solution, 
if any at all. 


3.2 Synthesis of Complete AMBA AHB Arbiter 


To test maturity and scalability of our tool, we synthesized the AMBA AHB 
arbiter [2], a common case study for reactive synthesis. We used the parameter- 
ized specification from [17] for n = 2 masters, which was also part of SYNT- 
COMP 2016, but was left unsolved by any tool. With a memory limit of 128 GB, 
we could decide realizability within 26 min and produce a Mealy machine with 
83 states after minimization. While specialised GR(1) solvers [2,4,12] or decom- 
positional approaches [3] are able to synthesize the specification in a matter of 
minutes, to the best of our knowledge we are the first full LTL synthesis tool that 
can handle the complete non-decomposed specification in a reasonable amount 
of time. For comparison, LTLSYNT (2.5) needs more than 2.5 days on our system 
and produces a Mealy machine with 340 states. 


3.3 Discussion 


The LTLSYNT tool is part of Spot [8], which uses a Safra-style determinization 
procedure for NBAs. Conceptually, it also uses DPAs and a parity game solver as 
a decision procedure. However, as shown in [10] the produced automata tend to 
be larger compared to our translation, which probably results in the lower quality 
score. Our approach has similar performance and scales better on certain cases. 
The instances where LTLSYNT performs better than STRIX are specifications that 
we cannot split efficiently and the DPA construction becomes the bottleneck. 

Bounded synthesis approaches (BOSy, PARTY) tend to produce smaller 
Mealy machines and to be able to handle larger alphabets. However, they fail 
when the minimal machine implementing the desired property is large, even if 
there is a compact implementation as a circuit. In our approach, we can often 
solve these cases and still regain compactness of the implementation through 
minimization afterwards. The strength of the PARTY portfolio is the combina- 
tion of traditional bounded synthesis and a novel approach by reduction to safety 
games, which results in a large number of solved instances, but reduces the avg. 
quality score. 
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Future Work. STRIX combines Java (LTL simplification and automata trans- 
lations) and C++ (parity game construction and solving). We believe that a 
pure C++ implementation will further improve the overall runtime and reduce 
the memory footprint. Next, there are several algorithmic questions we want 
to investigate going forward, especially expanding parallelization of the tool. 
Furthermore, we want to reduce the dependency on external tools for circuit 
generation in order to be able to fine-tune this step better. Especially replac- 
ing SPECULOOS is important, since it turned out that it was unable to handle 
complex transition systems. 
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Attribution 4.0 International License (http://creativecommons.org/licenses/by /4.0/), 
which permits use, sharing, adaptation, distribution and reproduction in any medium 
or format, as long as you give appropriate credit to the original author(s) and the 
source, provide a link to the Creative Commons license and indicate if changes were 
made. 

'The images or other third party material in this chapter are included in the chapter's 
Creative Commons license, unless indicated otherwise in a credit line to the material. If 
material is not included in the chapter's Creative Commons license and your intended 
use is not permitted by statutory regulation or exceeds the permitted use, you will 
need to obtain permission directly from the copyright holder. 


Check for 
updates 


Btor2 , BtorMC and Boolector 3.0 


Aina Niemetz!:?"=)@, Mathias Preiner!:©, 
Clifford Wolf?, and Armin Biere! (5 


! Johannes Kepler University Linz, Linz, Austria 
? Stanford University, Stanford, USA 
niemetzOcs.stanford.edu 
3 Symbiotic EDA, Vienna, Austria 


Abstract. We describe BTOR2, a word-level model checking format for 
capturing models of hardware and potentially software in a bit-precise 
manner. This simple, line-based and easy to parse format can be seen as 
a sorted extension of the word-level format BTOR. It uses design princi- 
ples from the bit-level format AIGER and follows semantics of the SMT- 
LIB logics of bit-vectors with arrays. This intermediate format can be 
used in various verification flows and is perfectly suited to establish a 
word-level model checking competition. It is supported by our new open 
source model checker BtorMC, which is built on top of version 3.0 of our 
SMT solver Boolector. We further provide new word-level benchmarks 
on which these open source tools are evaluated. 


Our format BTOR2 generalizes and extends the BTor [5] format, which can be 
seen as a word-level generalization of the initial version of the bit-level format 
AIGER [2]. BTOR is a format for quantifier-free formulas over bit-vectors and 
arrays with SMT-LIB [1] semantics but also provides sequential extensions for 
specifying word-level model checking problems with registers and memories. In 
contrast to B TOR, which is tailored towards bit-vectors and one-dimensional bit- 
vector arrays, BTOR2 has explicit sort declarations. It further allows to explicitly 
initialize registers and memories (instead of implicit initialization in BTOR) and 
extends the set of sequential features with witnesses, invariant and fairness con- 
straints, and liveness properties. All of these are word-level variants lifted from 
corresponding features in the latest AIGER format [4], the input format of the 
hardware model checking competition (HWMCC) [3,6] since 2011. We provide 
an open source BTOR2 tool suite, which includes a generic parser, random sim- 
ulator and witness checker. We further implemented a reference bounded model 
checker BtorMC on top of our SMT solver Boolector. We consider BTOR2 as an 
ideal candidate to establish a word-level hardware model checking competition. 


1 Format Description 


'The syntax of BTOR2 is shown in Fig. 1. The sort keyword is used to define arbi- 
trary bit-vector and array sorts. This not only allows to specify multi-dimensional 
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(num) — positive unsigned integer (greater than zero) 
(uint) — unsigned integer (including zero) 
(string) = sequence of whitespace and printable characters without '\n’ 
(symbol) = sequence of printable characters without 'An' 
(comment) «=  '; (string) 
(nid) — (num) 
(sid) = (num) 
(const) = ‘const’ (sid) [0-1]+ 
(constd) =  'constd' (sid) ['-'](uint) 
(consth) =  'consth' (sid) [0-9a-fA-F]+ 
(input) =  ('input' | ‘one’ | ‘ones’ | 'zero') (sid) | (const) | (constd) | (consth) 
(state) = ‘state’ (sid) 
(bitvec) = 'bitvec’ (num) 
(array) = ‘array’ (sid) (sid) 
(node) = (sid) 'sort' ( (array) | (bitvec) ) 
| (nid) ( (input) | (state) ) 
| (nid) (opidx) ( s (nid) (uint) [(uint)] 
| (nid) (op) (sid) (nid) [(nid) [(nid)] 
| (ni EP | ‘next’ ) ( (sid) (nid) (nid) 
| (nid) (‘bad’ l’ constraint’ | ‘fair’ | ‘output’ ) (nid) 
| (nid) ‘justice’ (num) ( (nid) )+ 
(line) = (comment) | (node) [ (symbol) ] [ (comment) ] 
(btor) = ( (line)'\n’ )+ 


Fig. 1. Syntax of BTOR2. Non-terminals (opidx) and (op) are indexed and non-indexed 
operators as defined in Table 1 (sequential part in red). (Color figure online) 


arrays but can be extended to support (uninterpreted) functions, floating points 
and other sorts. As a consequence, BTOR2 is not backwards compatible with 
Bror. For clarity, in Fig. 1 we distinguish between node (line) identifiers (nid) 
and sort identifiers (sid), and do not allow an identifier to occur in both sets. 
Introducing sorts renders type specific keywords such as var, array and acond from 
BTOR obsolete. Instead, BTOR2 uses the keyword input to declare bit-vector and 
array variables of a given sort. Bit-vector constants are created as in BTOR with 
the keywords const[dh], one, ones and zero. 

Bit-vector and array operators as supported by BTOR2 and their respective 
sorts are shown in Table1. We use 5" for a bit-vector sort of width n, and Z 
and € for the index and element sorts of an array sort A7~®. Note that some 
bit-vector operators can be interpreted as signed or unsigned. In signed context, 
as in SMT-LIB, bit-vectors are represented in two's complement. 


2 Sequential Extension 


As shown in Fig.1, the sequential extension of BTOR2 introduces a state key- 
word, which allows to specify registers and memories. In contrast to BTOR, where 
registers are implicitly zero-initialized and memories are uninitialized, BTOR2 
provides a keyword init to explicitly define initialization functions for states. This 
enables us to also model partial initialization. For example, initializing a mem- 
ory with a bit-vector constant zero, zero-initializes the whole memory, whereas 
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Table 1. Operators supported by BTOR2, where B” represents a bit-vector sort of size 
n and .A7—* represents an array sort with index sort Z and element sort £. 


indexed 

[su]ext w (un)signed extension | B" — B"*'* 

slice u l extraction, n > u > l| B” > Bv 
unary 

not bit-wise B” — B” 

inc, dec, neg arithmetic B” — B” 

redand, redor, redxor reduction B"— B! 

binary 

iff, implies Boolean B! x B! — B! 

eq, neq (dis)equality SxS — B' 

[su]gt, [su]gte, [su]lt, [su]lte (un)signed inequality |B” x B” — B! 

and, nand, nor, or, xnor, xor bit-wise B” x B” — B” 

rol, ror, sll, sra, srl rotate, shift B” x B” — B” 

add, mul, [su]div, smod, [su]rem, sub | arithmetic B” x B” — B” 
[suJaddo, [su]divo, [su]mulo, [su]subo | overflow B” x B” — Bt 
concat concatenation B" x B™ pem 
read array read ATO Xx TOE 
ternary 

ite conditional B! x B” xB?” => B” 
write array write AE x Tx £ — Ate 


partially initializing a register can be achieved by applying a bit-mask to an 
uninitialized register. 

'Iransition functions for both registers and memories are defined with the 
next keyword. It takes the current and next states as arguments. A state variable 
without associated next function is treated as a primary input, i.e., it has the 
same behaviour as inputs defined via keyword input. Note that BTOR provides 
a next keyword for registers and an anext keyword for memories. Using sorts in 
BTOR2 avoids such sort specific keyword variants. 

As in the latest version of AIGER [4], BTOR2 supports bad state properties, 
which are essentially negations of safety properties. Multiple properties can be 
specified by simply adding multiple bad state properties. Invariant constraints 
can be introduced via the constraint keyword and are assumed to hold globally. 
A witness for a bad state property is an initialized finite path, which reaches 
(actually, contains) a bad state and satisfies all invariant constraints. 

Again as in AIGER [4], keywords fair and justice allow to specify (global) 
fairness constraints and (negations of) liveness properties. Each justice property 
consists of a set of Büchi conditions. A witness for a justice property is an infinite 
initialized path on which all Büchi conditions and all global fairness constraints 
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are satisfied infinitely often. In addition, all global invariant constraints have to 
hold. The justice keyword takes a number (the number of Büchi conditions) and 
an arbitrary number of nodes (the Büchi conditions) as arguments. 


3 Witness Format 


The syntax of the BTOR2 witness format is shown in Fig. 2. A BTOR2 witness 
consists of a sequence of valid input assignments grouped by (time) frames. It 
starts with 'sat' followed by a list of properties that are satisfied by the witness. 
A property is identified by a prefix 'b' (for bad) and 'j' (for justice) followed by 
a number i, which ranges over the number of defined bad and justice properties 
starting from 0. For example, 'bO j0' refers to the first bad and first justice 
property in the order as they occur in the BTOR2 input. The list of properties is 
followed by a sequence of k +1 frames at time t € {0,...,k}. A frame is divided 
into a state and input part. The state part starts with ‘#t’ and is mandatory 
for the first frame (t — 0) and optional for later frames (t » 0). It contains 
state assignments at time t. The input part starts with 'Ot' and consists of input 
assignments of the transition from time £t to t+ 1. If states are uninitialized 
(no init), their initial assignment is required to be specified in frame ‘#0’. The 
state part is usually omitted for t > 0 since state assignments can be computed 
from states and inputs at time t — 1. While don't care inputs can be omitted, 
our witness checker assumes that they are zero. Input and state assignments use 
the same numbering scheme as properties, i.e., states and inputs are numbered 
separately in the order they are defined, starting from 0. For example, 0 in 
frame ‘#t' (or 'Qt') refers to the first state (or input) as defined in the BTOR2 
input. For justice properties we assume the witness to be lasso shaped, i.e., the 
next state, which can be computed from the last state and inputs at time k, is 
identical to one of the previous states at time t = 0... k. Asin AIGER, a BTOR2 
witness is terminated with '.' on a separate line. 


[0-1]- 
(binary-string) 
‘|’ (binary-string) ']' (binary-string) 


binary-string) 
bv-assignment) 
array-assignment) 


( 

| 

(assignment) (uint) ( (bv-assignment) | (array-assignment) ) [(symbol)] 
(model) ( (comment)'An' | (assignment) An' )4- 

(state part) '#' (uint) '\n’ (model) 

(input part) x= — 'Q' (uint) An' (model) 

(frame) u— [(state part)] (input part) 

(prop) == ("b [T uint) 

(header) u— 'sat\n’ ( (prop) )+ '\n’ 

(witness) z— ( (comment)'An' )+ | (header) ( (frame) )+ '.' 


Fig. 2. BTOR2 model and witness format syntax (sequential part in red). (Color figure 
online) 


Figure3 illustrates a simple C program (left), the corresponding BTOR2 
model with the negation of the assertion as a bad property (center), and a 
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#include <assert.h> 1 sort bitvec 1 sat 
#include <stdio.h> 2 sort bitvec 32 bO 
#include <stdlib.h> 3 input 1 turn #0 
#include <stdbool.h> 4 state 2 a @0 
static bool read_bool () { 5 state 2 b O 1 turned 
int ch = getc (stdin); 6 zero 2 @1 
if (ch == ’0’) return false; 7 init 246 O O turn@1 
if (ch == ?1?) return true; 8 init 256 @2 
exit (0); 9 one 2 0 O turn@2 
} 10 add 249 @3 
int main () { 11 add 259 O O turn@3 
bool turn; // input 12 ite 2 3 4 10 @4 
unsigned a = 0, b = 0; // states 13 ite 2 -3 5 11 O0 1 turn@4 
for (;;) ( 14 next 2 4 12 e5 
turn = read bool (); 15 next 2 5 13 0 1 turn@5 
assert (!(a == 3 && b == 3)); 16 constd 2 3 e6 
if (turn) a= a + 1; 17 eq 1 4 16 O O turn@6 
else b=b+1; 18 eq 1 5 16 
y 19 and 1 17 18 
} 20 bad 19 


Fig. 3. Example C program with corresponding BTOR2 model and witness. 


BTOR2 witness for the violated property (right). The BTOR2 model defines 
one bad property (a == 3 && b == 3), which is satisfied in frame 6. The corre- 
sponding witness identifies this property as bad property 'b0' (first bad property 
defined in the model). All states are initialized, hence ‘#0' is empty, and ‘@0’ 
to ‘@6’ indicate the assignments of input 0 (turn, the first input defined in the 
model) in frames 0 to 6, e.g., turn = 1 at t = 0, turn = 0 at t = 1 and so 
on. In frame 6, both states a and b reach value 3, and therefore property ‘bO’ is 
satisfied. 


4 "Tools 


We provide a generic stand-alone parser for BTOR2, which features basic type 
checking and consists of approx. 1,500 lines of C code. We implemented a refer- 
ence bounded model checker BtorMC, which currently supports checking safety 
(aka. bad state) properties for models with registers and memories and produces 
witnesses for satisfiable properties. Unrolling the model is performed by sym- 
bolic simulation, i.e., symbolic substitution of current state expressions into next 
state functions, and incremental SMT solving. We also implemented a simulator 
for randomly simulating BTOR2 models. It further supports checking BTOR2 
witnesses. The model checker is tightly integrated into our SMT solver Boolec- 
tor [18], an award-winning SMT solver for the theory of fixed-size bit-vectors 
with arrays and uninterpreted functions. Since the last major version [18], we 
extended Boolector with several new features. Most notably, Boolector 3.0 now 
comes with support for quantified bit-vectors [24] and two different local search 
strategies for quantifier-free bit-vector formulas that don't rely on but can be 
combined with bit-blasting [19,21,22]. It further provides support for BTOR2. 
In contrast to previous versions of Boolector, Boolector 3.0 and all BTOR2 tools 
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are released under the MIT open source license and the source code is hosted on 
GitHub!. 


5 Experiments 


We collected ten real-world (System) Verilog designs with safety properties from 
various open source projects [11,26—28]. The majority of these designs include 
memories. We used the open synthesis suite Yosys [29] to synthesize these designs 
into BTOR2 and SmMT-Lis. For BTOR2, Yosys directly generates the models 
from a circuit description. For SMT-LIB, since the language does not support 
describing model checking problems, we used Yosys in combination with Yosys- 
SMTBMC to produce unrolled (incremental) problems. 

We compared BtorMC against the most recent versions of Boolector (3.0) 
and Yices [10] (2.5.4), the two best solvers of the QF_ABV division of the SMT 
competition 2017. The BTOR2 models serve as input for BtorMC, and the incre- 
mental SMT-LIB benchmarks serve as input for Boolector and Yices. All bench- 
marks, synthesis scripts, generated files, log files and the source code of our tools 
for this evaluation are available at http:/ /fmv.jku.at/cav18-btor2. 

'The results in Table2 show that our flow using BTOR2 as intermediate for- 
mat is competetive with simple unrolling. Note that our model checker BtorMC 
issues incremental calls to Boolector. However, in Boolector, sophisticated word- 
level rewriting is currently disabled in incremental mode. We expect a major 
performance boost by fully supporting incremental word-level preprocessing. 


Table 2. BtorMC/BTOR2 vs. unrolled SMT-LIB with a time limit of 3600 s, where k 
is the bound and #bad is the number of bad properties. 


Benchmark k #bad|BtorMC_ | Boolector | Yices time[s] 
time|s] time|s] 
picorv32-check 30 23 4.8 18.9 10.8 
picorv32-pcregs 20 3 63.0 293.0 TO 
ponylink-slave T Xlen-sat 230 1 305.5 406.8 145.6 
ponylink-slaveTXlen-unsat | 231 1 183.8 131.4 71.4 
VexRiscv-regch0-15 17 2 9.6 48.3 12.2 
VexRiscv-regch0-20 22 |2 528.8 520.7 2232.2 
VexRiscv-regch0-30 32 | 2 TO TO TO 
zipcpu-busdelay 100 50 157.0 287.0 181.2 
zipcpu-pfcache 100 39 17.4 19.9 32.5 
zipcpu-zipmmu 30 | 57 86.0 412.9 46.5 


! https://github.com/boolector. 
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6 Conclusion 


We propose BTOR2, a new word-level model-checking and witness format. For 
this format we provide a generic parser implementation, a simulator that also 
checks witnesses, and a reference bounded model checker BtorMC, which is 
tightly integrated with our SMT solver Boolector. These open source tools are 
evaluated on new real-world benchmarks, which we synthesized from open source 
hardware (System) Verilog models into BTOR2 and SMT-LIB with Yosys. The 
tool Verilog2SMV [14] translates Verilog into model-checking problems in several 
formats, including nuXmv [7] and BTOR. However, its translation to BTOR is 
incomplete and development discontinued. 

We plan to provide a translator from BTOR2 into SALLY [25], and VMT [8], 
which are both extensions of SMT-LIB to model symbolic transition systems. 
It might also be interesting to translate incremental SMT-LIB benchmarks and 
horn clause models (as handled by, e.g., uZ [13]) into BTOR2 and vice versa. 
We hope other compilers and model checkers such as SAL [9], EBMC [15] and 
ABC [12,16] will provide support to produce and read BTOR2 models. We want 
to extend the format to other logics, in particular to support lambdas as in [23]. 
There is also a need for fuzzing [20] and delta-debugging tools [17]. 

Last but not least, we want to use this format to bootstrap a word-level 
model checking competition, which of course needs more benchmarks. 
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Abstract. We present Nagini, an automated, modular verifier for 
statically-typed, concurrent Python 3 programs, built on the Viper ver- 
ification infrastructure. Combining established concepts with new ideas, 
Nagini can verify memory safety, functional properties, termination, 
deadlock freedom, and input/output behavior. Our experiments show 
that Nagini is able to verify non-trivial properties of real-world Python 
code. 


1 Introduction 


Dynamic languages have become widely used because of their expressiveness 
and ease of use. The Python language in particular is popular in domains like 
teaching, prototyping, and more recently data science. Python’s lack of safety 
guarantees can be problematic when, as is increasingly the case, it is used for 
critical applications with high correctness demands. The Python community has 
reacted to this trend by integrating type annotations and optional static type 
checking into the language [20]. However, there is currently virtually no tool 
support for reasoning about Python programs beyond type safety. 

We present Nagini, a sound verifier for statically-typed, concurrent Python 
programs. Nagini can prove memory safety, data race freedom, and user-supplied 
assertions. Nagini performs modular verification, which is important for verifi- 
cation to scale and to be able to verify libraries, and automates the verification 
process for programs annotated with specifications. 

Nagini builds on many techniques established in existing tools: (1) Like Veri- 
Fast [10] and other tools [4, 19,22], it uses separation logic style permissions [16] 
in order to locally reason about concurrent programs. (2) Like .NET Code Con- 
tracts [7], it uses a contract library to enable users to write code-level spec- 
ifications. (3) Like many verification tools [2,6,11,13], it verifies programs by 
encoding the program and its specification into an intermediate verification lan- 
guage [1,8], namely Viper [14], for which automatic verifiers already exist. 

Nagini combines these techniques with new ideas in order to verify advanced 
properties and handle the dynamic aspects of Python. In particular, Nagini 
implements a comprehensive system for verifying finite blocking [5] and 
input/output behavior [18], and builds on Mypy [12] to verify safety while also 
supporting important dynamic language features. Nagini is intended for veri- 
fying substantial, real-world code, and is currently used to verify the Python 
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implementation of the SCION internet architecture [3]. To our knowledge, it 
is the first tool to enable automatic verification of Python code. Existing tools 
for JavaScript [21,24] also target a dynamic language, but focus on faithfully 
modeling JavaScript’s complex semantics rather than practical verification of 
high-level properties. 

Due to its wide range of verifiable properties, Nagini has applications in 
many domains: In addition to memory safety, programmers can choose to prove 
that a server implementation will stay responsive, that data science code has 
desired functional properties, or that algorithms terminate and preserve certain 
invariants, for example in a teaching context. Nagini is open-source and available 
online!, and can be used from the popular PyCharm IDE via a prototype plugin. 

In this paper, we describe Nagini’s supported Python subset and specification 
language, give an overview of its implementation and the encoding from Python 
to Viper, and provide an experimental evaluation of Nagini on real-world code. 


2 Language and Specifications 


Python Subset: Nagini requires input programs to comply to the static, nom- 
inal type system defined in PEP 484 [20] as implemented in the Mypy type 
checker [12], which requires type annotations for function parameters and return 
types, but can normally infer types of local variables. Nagini fully supports the 
non-gradual part of Mypy’s type system, including generics and union types. 

The Python subset accepted by Mypy and Nagini can accommodate most 
real Python programs, potentially via some workarounds like using union types 
instead of structural typing. While our subset is statically typed, it includes many 
features and potential pitfalls not found in static languages, such as dynamic 
addition and removal fields from objects. Some other features like reflection and 
dynamic code generation are not supported. 

Where compromises are necessary, Nagini aims for modularity, performance, 
and completeness for features typically found in user code over general sup- 
port for all language features. As an example, Nagini works with a simplified 
model of Python’s object attribute lookup behavior: A simple attribute access 
in Python leads to the invocation of several “magic” methods, which, if mod- 
elled correctly, would result in an overhead that would likely make automatic 
verification intractable. Nagini exploits the fact that these methods are mostly 
used to implement decorators, metaclasses, and system libraries, but rarely in 
user code. It assumes the default behavior of those methods, and implements 
direct support for frequently-used decorators and metaclasses that change their 
behavior. Importantly, Nagini flags an error if verified programs override these 
methods or are otherwise outside the supported subset, and is therefore sound. 


Specification Language: Nagini includes a library of specification functions sim- 
ilar to .NET Code Contracts [7] to express pre- and postconditions, loop invari- 
ants, and other assertions. Calls to these functions are interpreted as specifica- 
tions by Nagini, but can be automatically removed before execution. Users can 


1 https: //github.com/marcoeilers/nagini. 
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1 from nagini, contracts.contracts import x» 

2 from typing import List 

3 import db 

4 

5 class Ticket: 

6 def ^ init | (self , show: int, row: int, seat: int) —> None: 

7 self .show_id = show 

8 self.row, self.seat = row, seat 

9 Fold(self.state()) 

10 Ensures(self.state() and MayCreate(self , 'discount, code ')) 
11 

12 @Predicate 

13 def state(self) —> bool: 

14 return Acc(self.show, id) and Acc(self.row) and Acc(self.seat) 
15 


16 def order, tickets(num: int, show, id: int, code: str=None) —> List [Ticket]: 
17 Requires(num » 0) 


18 Exsures(SoldoutException , True) 

19 seats = db. get_seats(show_id, num) 

20 res = [] # type: List[ Ticket] 

21 for row, seat in seats: 

22 Invariant(list pred(res)) 

23 Invariant(Forall(res, lambda t: t.state() and 
24 Implies(code is not None, Acc(t.discount, code)))) 
25 Invariant(MustTerminate(len(seats) — len(res))) 
26 ticket — Ticket(show, id, row, seat) 

27 if code: 

28 ticket.discount, code — code 

29 res.append(ticket) 

30 return res 


Fig.1. Example program demonstrating Nagini's specification language. Contract 
functions are highlighted in italics. Note that functional specifications and postcon- 
ditions are largely omitted to highlight the different specification constructs. 


annotate Mypy-style type stub files for external libraries with specifications; the 
program will then be verified assuming they are correct. A detailed explanation 
of the specification language can be found in Nagini's Wiki?. 

An example of an annotated program is shown in Fig. 1. The first two lines 
import the contract library and Python’s library for type annotations. Pre- 
and postconditions are declared via calls to the contract functions Requires and 
Ensures in lines 17 and 10, respectively. The arguments of these functions are 
interpreted as assertions, which can be side-effect free boolean Python expres- 
sions or calls to other contract functions. Similarly, loops must be annotated 
with invariants (line 22), and special exceptional postconditions specify which 
exceptions a method may raise, and what postconditions must hold in this case. 
The Exsures annotation in line 18 states that a SoldoutException may be raised 
and makes no guarantees in this case. The invariant MustTerminate in line 25 
specifies that the loop terminates; the argument represents a ranking function [5]. 

Like the underlying Viper language, Nagini uses Implicit Dynamic Frames 
(IDF) [23], a variation of separation logic [16], to achieve framing and allow local 
reasoning in the presence of concurrency. IDF establishes a system of permis- 
sions for heap locations that roughly corresponds to separation logic's points-to 
predicates. Methods may only read or write heap locations they currently hold 
a permission for, and can specify which permissions they require from and give 


? https://github.com/marcoeilers/nagini/ wiki. 
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back to their caller in their pre- and postconditions. Since there is only ever a 
single permission per heap location, holding a permission guarantees that neither 
other threads nor called methods can modify the respective location. 

In Nagini, a permission is created when a field is assigned to for the first 
time; e.g., when executing line 9, the |. init |. method will have permission to 
three fields. Permission assertions are expressed using the Acc function (line 14). 
Assertions can be abstracted over using predicates [17], declared in Nagini by 
using annotated functions (line 12). In the example, the constructor of Ticket 
bundles all available permissions in the predicate state using the ghost state- 
ment Fold in line 9 and subsequently returns this predicate to its caller via its 
postcondition. 

In addition, Nagini offers a second kind of permission that allows creating a 
field that does not currently exist, but cannot be used for reading (since that 
would cause a runtime error). Constructors implicitly get this kind of permis- 
sion for every field mentioned in a class; in the example, such a permissions is 
returned to the caller (line 10) and used in line 28. The loop invariant contains 
the permission to modify the res list using one of several built-in predicates for 
Python's standard data types (line 22) as well as permissions to the fields of all 
objects in the list (line 23). This kind of quantified permission |15], correspond- 
ing to separation logic's iterated separating conjunction, is one of two supported 
ways to express permissions over unbounded numbers of heap locations. 

Other contract functions allow specifying, e.g., I/O behavior, and some have 
variations for advanced users, e.g., the Forall function can take trigger expressions 
to specify when the underlying SMT solver should instantiate the quantifier. 


Verified properties: Nagini verifies some safety properties by default: Verified 
programs will not raise runtime errors or undeclared exceptions. The permission 
system guarantees that verified code is memory safe and free of data races. 
Nagini also verifies some properties that Mypy only checks optimistically, e.g., 
that referenced names are defined before they are used. As an example, if the 
Ticket class were defined after the order tickets function, Nagini would not allow 
calls to the function before the class definition, because of the call in line 26. 

Beyond this, Nagini can verify (1) functional properties, (2) input/output 
properties, i.e., which I/O operations may or must occur, using a generalization 
of the method by Penninckx et al. [18], and (3) finite blocking [5], i.e., that no 
thread blocks indefinitely when trying to acquire a lock or join another thread, 
which includes deadlock freedom and termination. Verification is modular in the 
sense that adding code to a program only requires verifying the added parts; any 
code that verified before is guaranteed to still verify. Top level statements are 
an exception and have to be reverified when any part of the program changes, 
since Python's import mechanism is inherently non-modular. 


3 Implementation 


Nagini's verification workflow is depicted in Fig. 2. After parsing, Nagini invokes 
the Mypy type checker on the input and rejects the program if errors are found. 


600 M. Eilers and P. Müller 


Python AST 


/ Mann E Viper AST Pe " 
EA i : i 
ej d : Analyzer : i SE 
Son Vi = 
i — Translator : i za 
I || ow = | VA 
i i i vee 
X : Python Program : X i 


Model 


Python Error Viper Error 


Fig. 2. Nagini verification workflow. 


It then analyzes the input program and extracts structural information into an 
internal model, which is then encoded into a Viper program. The program is 
verified using one of the two Viper backends, based on either symbolic execu- 
tion (SE) or verification condition generation (VCG), respectively. Any resulting 
Viper-level error messages are mapped back to a Python-level error. 


Encoding: Nagini encodes Python programs into Viper programs that verify only 
if the original program was correct. At the top level, Viper programs consist 
of methods, whose bodies contain imperative code, side-effect free functions, 
and the aforementioned predicates, as well as domains, which can be used to 
declare and axiomatize custom data types. The structure of a created Viper 
program roughly follows the structure of the Python program: Each function in 
the Python program corresponds to either a method, a function, or a predicate 
in the Viper program, depending on its annotation. Additional Viper methods 
are generated to check proof obligations like behavioral subtyping and to model 
the execution of all top level statements. 

Nagini maintains various kinds of ghost state, e.g., for verifying finite blocking 
and to represent which names are currently defined. It models Python's type sys- 
tem using a Viper domain axiomatized to reflect subtype relations. Nagini desug- 
ars complex Python language constructs into simple ones that exist in Viper, but 
subtle language differences often require additional effort in the encoding. As an 
example, Viper distinguishes references from primitive values whereas Python 
does not, requiring boxing and unboxing operations in the encoding. 


Tool interaction: Nagini is invoked on an annotated Python file, and verifies 
this file and all (transitive) imports without user interaction. It then outputs 
either a success message or Python-level error messages that indicate type or 
verification errors, use of unsupported features, or invalid specifications, along 
with the source location. As an example, removing the Fold statement in line 9 of 
Fig. 1 yields the error message “Postcondition of .init.. might not hold. There 
might be insufficient permission to access self.state(). (example.py@10.16)”. 


4 Evaluation 


In addition to having a comprehensive test suite of over 12,500 lines of code, 
we have evaluated Nagini on a set of examples containing (parts of) implemen- 
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Example LOC / Spec.| Viper LOC SF|FC|FB IO| Tseq| TPar 
l|rosetta/quicksort 31 / 10 635 y|- |Z |-| 848| 8.31 
2 'interactivepython/bst 145 / 65 947 Viv | - |- 57.4/41.80 
3|keon/knapsack 33 / 10 864 V|-|- | - |19.3914.49 
4|wikipedia/duck typing 19/0 486 ¥|- |] - | - | 1.82) 1.92 
5|scion/path store 207 / 94 2133 X|- |- |- [91.37 35.26 
6|example 40 / 19 736 V|-|vV|-| 6411, 5.91 
T verifast/brackets checker 143 / 82 1081 Viv |v v 7.66] 6.63 
8 verifast/putchar with buffer! 139 / 88 865 VI] EVO ATA 14:29 
9|chalice2viper / watchdog 66 / 22 769 V|-|[vV-]| 3.66 3.41 

10|parkinson/recell 46 / 25 561 ¥ |v | - | - | 2.09) 2.07 


Fig. 3. Experiments. For each example, we list the lines of code (excluding whitespace 
and comments), the number of those lines that are used for specifications, the length 
of the resulting Viper program, properties (SF = safety, FC = functional correctness, 
FB = finite blocking, IO = input/output behavior) that could be verified (v), could not 
be verified (X) or were not attempted (-), and the verification times with Viper’s SE 
backend, sequential and parallelized, in seconds. 


tations of standard algorithms from the internet’, the example from Fig. 1, a 
class from the SCION implementation, as well as examples from other verifiers 
translated to Python. Figure 3 shows the examples and which properties were 
verified; the functional property we proved for the binary search tree implemen- 
tation is that it maintains a sorted tree. The examples cover language features 
like inheritance (example 10), comprehensions (3), dynamic field addition (6), 
operator overloading (3), union types (4), threads and locks (9), as well as spec- 
ification constructs like quantified permissions (6) and predicate families (10). 
Nagini correctly finds an error in the SCION example and successfully verifies 
all other examples. 

The runtimes shown in Fig.3 were measured by averaging over ten runs on 
a Lenovo Thinkpad T450s running Ubuntu 16.04, Python 3.5 and OpenJDK 8 
on a warmed-up JVM. They show that Nagini can effectively verify non-trivial 
properties of real-life Python programs in reasonable time. Due to modular veri- 
fication, parts of a program can be verified independently and in parallel (which 
Nagini does by default), so that larger programs will not inherently lead to 
performance problems. This is demonstrated by the speedup achieved via par- 
allelization on the two larger examples; for the smaller ones, verification time is 
dominated by a single complex method. Additionally, the annotation overhead 
is well within the range of other verification tools [9]. 


Acknowledgements. Thanks to Vytautas Astrauskas, Samuel Hitz, and Fábio Pakk 
Selmi-Dei for their contributions to Nagini. We gratefully acknowledge support from 
the Zurich Information Security and Privacy Center (ZISC). 


3 We chose examples that do not make use of dynamic features or external libraries 
from rosettacode.org, interactivepython.org and github.com/keon/algorithms. 
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Abstract. We introduce PEREGRINE, the first tool for the analysis and 
parameterized verification of population protocols. Population protocols 
are a model of computation very much studied by the distributed com- 
puting community, in which mobile anonymous agents interact stochas- 
tically to achieve a common task. PEREGRINE allows users to design 
protocols, to simulate them both manually and automatically, to gather 
statistics of properties such as convergence speed, and to verify correct- 
ness automatically. This paper describes the features of PEREGRINE and 
their implementation. 


Keywords: Population protocols - Distributed computing 
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1 Introduction 


Population protocols [1,3,4] are a model of distributed computing in which repli- 
cated, mobile agents with limited computational power interact stochastically to 
achieve a common task. They provide a simple and elegant formalism to model, 
e.g., networks of passively mobile sensors [1,5], trust propagation [13], evolu- 
tionary dynamics [14], and chemical systems, under the name chemical reaction 
networks [12,16,19]. 

Population protocols are parameterized: the number of agents does not 
change during the execution of the protocol, but is a priori unbounded. A 
protocol is correct if it behaves correctly for all of its infinitely many initial 
configurations. For this reason, it is challenging to design correct and efficient 
protocols. 

In this paper we introduce PEREGRINE!, the first tool for the parameterized 
analysis of population protocols. PEREGRINE is intended for use by researchers 
in distributed computing and systems biology. It allows the user to specify pro- 
tocols either through an editor or as simple scripts, and to analyze them via a 
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graphical interface. The analysis features of PEREGRINE include manual step- 
by-step simulation; automatic sampling; statistics generation of average conver- 
gence speed; detection of incorrect executions through simulation; and formal 
verification of correctness. The first four features are supported for all protocols, 
while verification is supported for silent protocols, a large subclass of proto- 
cols [6]. Verification is performed automatically over all of the infinitely many 
initial configurations using the recent approach of [6] for solving the so-called 
well-specification problem. 


Related Work. The problem of automatically verifying that a population proto- 
col conforms to its specification for one fixed initial configuration has been con- 
sidered in [10,11,17,20]. In [10], ad hoc search algorithms are used. In [11,17], 
the authors show how to model the problem in the probabilistic model checker 
PRISM, and under certain conditions in SPIN. In [20], the problem is modeled 
with the PAT toolkit for model checking under fairness assumptions. All these 
tools increase our confidence in the correctness of a protocol. However, compared 
to PEREGRINE, they are not visual tools, they do not offer simulation capabili- 
ties, and they can only verify the correctness of a protocol for a finite number 
of initial configurations, with typically a small number of agents. PEREGRINE 
proves correctness for all of the infinitely many initial configurations, with an 
arbitrarily large number of agents. 

As mentioned in the introduction, population protocols are isomorphic to 
chemical reaction networks (CRNs), a popular model in natural computing. 
Cardelli et al. have recently developed model checking techniques and analysis 
algorithms for stochastic CRNs [7-9]. The problems studied therein are incom- 
parable to the parameterized questions addressed by PEREGRINE. 

The verification algorithm of PEREGRINE is based on [6], where a novel app- 
roach for the parameterized verification of silent population protocols has been 
presented. The command-line tool of [6] only offers support for proving correct- 
ness, with no functionality for visualization or simulation. Further, contrary to 
PEREGRINE, the tool cannot produce counterexamples when correctness fails. 


2 Population Protocols 


We introduce population protocols through a simple example and then briefly 
formalize the model. We refer the reader to [4] for a more thorough but still 
intuitive presentation. Suppose anonymous and mobile agents wish to take a 
majority vote. Intuitively, anonymous means that agents have no identity, and 
mobile that agents are “wandering around”, and can only interact whenever they 
bump into each other. In order to vote, all agents conduct the following protocol. 
Each agent is in one out of four states (Y, N, y, n]. Initially all agents are in the 
states Y or N, corresponding to how they want to vote (states y, n are auxiliary 
states). Agents repeatedly interact pairwise according to the following rules: 


a: YN e yn b: Yn Yy c: Ny Nn d: yn yy 
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For example, if the population initially has two agents of opinion “yes” and one 
agent of opinion “no”, then a possible execution is: 


X,Y, NS S Wy, Y, ns > Ww Y, y5, (1) 


where e.g. (Y, Y, Nj denotes the multiset with two agents in state Y and one 
agent in state NV. 

'The goal of every population protocol is to ensure that the agents eventually 
reach a lasting consensus, i.e., a multiset in which (1) either all agents are in 
^yes"-states, or all agents are in “no”-states, and (2) further interactions do 
not destroy the consensus. On top of this universal specification, each protocol 
has an individual goal, determining which initial configurations should reach the 
“yes” and the “no” lasting consensus. In the majority protocol above, the agents 
should reach a “yes”-consensus iff 50% or more agents vote “yes”. 

Execution (1) above leads to a lasting *yes"-consensus; further, the consensus 
is the right one, since 2 out of 3 agents voted “yes”. In fact, assuming agents 
interact uniformly and independently at random, the above protocol is correct: 
executions almost surely reach a correct lasting consensus. 

More formally, a population protocol is a tuple (Q, T, 1, O) where Q is a 
finite set of states, T C Q? x Q? is a set of transitions, I C Q are the initial 
states and O: Q — {0,1} is the output mapping. A configuration is a non-empty 
multiset over Q, an initial configuration is a non-empty multiset over J, and a 
configuration is terminal if it cannot be altered by any transition. A configuration 
is in a consensus if all of its states map to the same output under O. 


An ezecution is a finite or infinite sequence Co EUN Ci £2,... such that C; is 
obtained from applying transition t; to C; 1. A fair execution is either a finite 
execution that reaches a terminal configuration, or an infinite execution such 
that if {i € N : C; > D) is infinite, then (i € N : C; = D} is infinite for any 
configuration D. In other words, fairness ensures that a configuration cannot be 
avoided forever if it is reachable infinitely often. Fairness is an abstraction of 
the random interactions occurring within a population. A configuration C is in 
a lasting consensus if every execution from C only leads to configurations of the 
same consensus. 

If for every initial configuration C, all fair executions from C lead to a last- 
ing consensus y(C) € {0,1}, then we say that the protocol computes the pred- 
icate y. For example, the above majority protocol with O(Y) = O(y) = 1 and 
O(N) = O(n) = 0 computes the predicate C[Y] > C[N], where C[x] denotes the 
number of occurrences of state z in C. A protocol does not necessarily compute a 
predicate. For example, if we alter the majority protocol by removing transition 
d, then (Y, NS 5 (y, nj is a fair execution, but (y, nj is not in a consensus. In 
other words, transition d acts as a tie-breaker which allows to reach the con- 
sensus configuration (y, yj. A protocol that computes a predicate is said to be 
well-specified. It is well-known that well-specified population protocols compute 
precisely the predicates definable in Presburger arithmetic [3]. On top of differ- 
ent majority protocols for the predicate Cr] > C[y], the literature contains, e.g., 
different families of so-called flock-of-birds protocols for the predicates Cx] > c, 
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where c is an integer constant, and families of threshold protocols for the pred- 
icates ay : C[zi] +--+ +an- Clan] > c, where a1,...,a4,c are integer constants 
and z4,...,2, are initial states. 


3 Analyzing Population Protocols 


PEREGRINE is à web tool with a JavaScript frontend and a Haskell backend. 
The backend makes use of the SMT solver Z3 [15] to test satisfiability of Pres- 
burger arithmetic formulas. The user has access to four main features through 
the graphical frontend. We present these features in the remainder of the section. 


Protocol Description. PEREGRINE offers a description language for both sin- 
gle protocols and families of protocols depending on some parameters. Single 
protocols are described either through a graphical editor or as simple Python 
scripts. Families of protocols (called parametric protocols) can only be specified 
as scripts, but PEREGRINE assists the user by generating a code skeleton. 


Simulation. Population protocols can be simulated through a graphical player 
depicted in Fig.1. The user can pick an initial configuration and simulate the 
protocol by either manual selection of interactions, or by letting a scheduler 
pick interactions uniformly at random. The simulator keeps a history of the 
execution which can be rewound at any time, making it easy to experiment with 
the different behaviours of a protocol. Configurations can be displayed in two 
ways: either as explicit populations, as illustrated in Fig. 1, or as bar charts of 
the states count, more convenient for large populations. 


© 


SIMULATION 


5 10 
GENRES] 00 B000 


medium speed © 


Fig. 1. Simulation of the majority protocol from the initial configuration 5- Y, 10- N$. 


Statistics. PEREGRINE can generate statistics from batch simulations. The user 
provides four parameters: Spin, Smax; M and n. PEREGRINE generates n random 
executions as follows. For each execution, a number s is picked uniformly at 
random from [smin; Smax]; and an initial configuration of size s is then picked 
uniformly at random. Each step of an execution is picked uniformly at random 
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among enabled interactions. If no terminal configuration is reached within m 
steps, then the simulation halts. In the end, n executions of length at most m 
are gathered. PEREGRINE classifies the generated executions according to their 
consensus, and computes statistics on the convergence speed (see the next two 
paragraphs). The results can be visualized in different ways, and the raw data 
can be exported as a JSON file. 


Consensus. For each random execution, PEREGRINE checks whether the last 
configuration of an execution is in a consensus and, if so, whether the consensus 
corresponds to the expected output of the protocol. PEREGRINE reports which 
percentage of the executions reach a consensus, and whether the consensus is cor- 
rect and/or lasting. In normal mode, PEREGRINE only classifies an execution as 
lasting consensus if it ends in a terminal configuration. In the increased accuracy 
mode, if the execution ends in a configuration C of consensus b € {0,1}, then 
the model checker LOLA [18] is used to determine whether there exists a config- 
uration C" such that C — C" and C’ is not of consensus b. If it is not the case, 
then PEREGRINE concludes that C is in a lasting consensus. PEREGRINE plots 
the percentage of executions in each category as a function of the population 
size, as illustrated on the left of Fig. 2. 


Average Convergence Speed. PEREGRINE also provides statistics on the conver- 
gence speed of a protocol. Let Co Sh Cı L RUN C, be an execution such 
that C; is in a consensus b € {0,1}. The number of steps to convergence of the 
execution is defined as 0 if all configurations are of consensus b, and otherwise as 
i+1, where i is the largest index such that C; is not in consensus b. For each pop- 
ulation size, PEREGRINE computes the average number of steps to convergence 
of all consensus executions of that population size, and plots the information as 
illustrated on the right of Fig. 2. 


Percentage of configurations 
Avg. num. steps to consensus 


5 10 15 20 25 5 10 15 20 25 


Population size Population size 


Fig. 2. Statistics for 5000 random executions of the approximate majority protocol 
of [2], of length at most 40, from initial configurations of size at most 25. The left plot 
shows the percentage of executions reaching a consensus (dark green: lasting correct, 
light green: correct, light red: incorrect, dark red: lasting incorrect) and no consensus 
(orange). In this example the occurrences of light red are negligible. The right plot 
shows the average number of steps to convergence. (Color figure online) 
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G9) The protocol does not satisfy correctness. 


Peregrine found a finite execution 7t from initial configuration Co to configuration C that violates 
correctness. The protocol should reach consensus true from Co, but instead 7 reaches C which is 
terminal and not in a consensus. Configurations Co and C; contain 2 agents, and execution 7r has 
length 1. 


SHOW COUNTER-EXAMPLE w fe) EXPORT 


You may replay execution 77: 


[v] 


Fig. 3. Verification of the majority protocol of Sect. 2 without transition d: yn — yy. 


Verification. PEREGRINE can automatically verify that a population proto- 
col computes a given predicate. Predicates can be specified by the user in 
quantifier-free Presburger arithmetic extended with the family of predicates 
{x = y (mod c)}->2, which is equivalent to Presburger arithmetic. For example, 
for the majority protocol of Sect. 2, the user simply specifies C[Y] >= C[N]. 

PEREGRINE implements the approach of [6] to verify correctness of protocols 
which are silent. A protocol is said to be silent if from every initial configuration, 
every fair execution leads to a terminal configuration. The majority protocol of 
Sect.2 and most existing protocols from the literature are silent [6]. We briefly 
describe the approach of [6] and how it is integrated into PEREGRINE. 

Suppose we are given a population protocol P and we wish to determine 
whether it computes a predicate y. The procedure first tries to prove that P 
is silent. This is done by verifying a more restricted condition called layered 
termination. Verifying the latter property reduces to testing satisfiability of a 
Presburger arithmetic formula. If this formula holds, then the protocol is silent, 
otherwise no conclusion is derived. However, essentially all existing silent proto- 
cols satisfy layered termination [6]. 

Once P is proven to be silent, the procedure attempts to prove that no “bad 
execution” exists. More precisely, it checks whether there exist configurations Co 
and C such that Co > C, Co is initial, C, is terminal, and C is not in consensus 
(Co) € {0,1}. Since reachability is not definable in Presburger arithmetic, a 
Presburger-definable over-approximation — of reachability, borrowed from Petri 
net theory, is used instead. We obtain the following formula $544 ecc: 


Co, C1: Co 5 Ci A N Cola] 2 0^. A suce(Ci,t) € (C1) V (O(a) = ^e(Co)). 


qgI tET q€C1 


Ww 


If Pbad-exec iS unsatisfiable, then P is correct. Otherwise, no conclusion is reached, 
and ®pad-exec iS iteratively strengthened by enriching the over-approximation tn 
Whenever Ppad-exec is satisfied by (Co, C1), PEREGRINE calls the model-checker 
LoLA to test whether C is indeed reachable from Co. If so, then PEREGRINE 
reports 7? to be incorrect, and generates a counter-example execution, which can 
be replayed or exported as a JSON file (see Fig. 3). 
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Currently PEREGRINE can verify protocols with up to a hundred states and 
a few thousands transitions. The bottleneck is the size of the constraint system. 
Due to lack of space, we refer the reader to [6] for detailed experimental results. 
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Abstract. Approximate circuits with relaxed requirements on func- 
tional correctness play an important role in the development of resource- 
efficient computer systems. Designing approximate circuits is a very 
complex and time-demanding process trying to find optimal trade-offs 
between the approximation error and resource savings. In this paper, we 
present ADAC—a novel framework for automated design of approximate 
arithmetic circuits. ADAC integrates in a unique way efficient simula- 
tion and formal methods for approximate equivalence checking into a 
search-based circuit optimisation. To make ADAC easily accessible, it is 
implemented as a module of the ABC tool: a state-of-the-art system for 
circuit synthesis and verification. Within several hours, ADAC is able 
to construct high-quality Pareto sets of complex circuits (including even 
32-bit multipliers), providing useful trade-offs between the resource con- 
sumption and the error that is formally guaranteed. This demonstrates 
outstanding performance and scalability compared with other existing 
approaches. 


1 Introduction 


In the recent years, reduction of power consumption of computer systems and 
mobile devices has become one of the biggest challenges in the computer indus- 
try. Approximate computing has been established as a new research field aim- 
ing at reducing system resource demands (and, in particular, power demands) 
by relaxing the requirement that all computations are always performed cor- 
rectly. Approximate computing exploits the fact that many applications, includ- 
ing image and multimedia processing, signal processing, data mining, machine 
learning, neural networks, and scientific computations, are error resilient, i.e. 
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produce acceptable results even though the underlying computations are per- 
formed with a certain error. Therefore, the error can be used as a design metric 
and traded for chip area, power consumption, or runtime. Chippa et al. [7] claims 
that almost 80% of runtime is spent in procedures that could be approximated. 

Approximate computing can be conducted at different system levels with 
arithmetic circuit approximation being one of the most popular as such circuits 
are frequently used in the core computations. In our work, we focus on functional 
approximation where the original circuit is replaced by a less complex one which 
exhibits some errors but improves non-functional circuit parameters such as 
power consumption or chip area. Circuit approximation can be formulated as an 
optimisation problem where the error and non-functional circuit parameters are 
conflicting design objectives. Designing complex approximate circuits is a time- 
demanding and error-prone process. Moreover, its automation is challenging too 
since the design space including candidate solutions is huge and checking that a 
candidate solution has the required error is itself a computationally demanding 
task, especially if formal guarantees on the error have to be ensured. 

In this tool paper, we present ADAC!—a novel framework for automated 
design of approximate circuits. The framework implements a design loop includ- 
ing (i) a generator of candidate solutions employing genetic search algorithms, 
(ii) an evaluator estimating non-functional parameters of a candidate solution, 
and (iii) a verifier checking that the candidate solution does not exceed the per- 
missible error. ADAC is integrated as a new module into the ABC tool—a state- 
of-the-art and widely used system for circuit synthesis and verification [1]. The 
framework takes as the inputs: 


— a golden combinational circuit in Verilog implementing the correct function- 
ality, 

— an error metric (such as the worst-case error, mean error, Hamming distance, 
etc.), 

— a threshold on the error metric representing the maximal permissible error, 

— a time limit on the overall design process, and 

— a file specifying sizes of gates available to the design process. 


With these inputs, ADAC searches for an approximate circuit satisfying the error 
threshold and having the minimal estimated chip area. Previous works [3, 14, 20, 
22] confirmed that the chip area is a good optimization objective as it highly 
correlates with power consumption, which is a crucial target in approximate 
computing. 

The results of [21] clearly demonstrate that search algorithms based on 
Cartesian Genetic Programming (CGP) [12] are well capable of generating 
high-quality approximate circuits. For complex circuits, however, a high num- 
ber of candidate solutions has to be generated and evaluated, which signifi- 
cantly limits the scalability of the design process. Our framework implements 
several approaches for error evaluation suitable for different error metrics and 
application domains. They include both SAT and BDD-based techniques for 


! https://github.com/imatyas/ ADAC. 
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approximate equivalence checking providing formal error guarantees as well 
as a bit-parallel circuit simulation utilising the computing power of modern 
processors. We also implement a novel search strategy that drives the search 
towards promptly verifiable approximate circuits, which significantly accelerates 
the design process in many cases [3]. As such, the framework offers a unique inte- 
gration of techniques based on simulation, formal reasoning, and evolutionary 
circuit optimisation. Our extensive experimental evaluation demonstrates that 
ADAC offers outstanding performance and scalability compared with existing 
methods and tools and paves a way towards an automated design process of 
complex provably-correct circuit approximations. 


2 Architecture and Implementation 


The ADAC framework has a modular architecture illustrated in Fig. 1. 

The setup phase is responsible mainly for preparing a chromosome represen- 
tation of the golden circuit. The circuit is given in a high-level Verilog format, 
which is first translated to a gate-level representation using the tool Yosys [25], 
and then the chromosome representation is obtained using our V2CH script. The 
setup phase is also responsible for generating a configuration file controlling the 
main design loop. It is generated from the user inputs and optional parameters 
for CGP and search strategies. 
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Fig. 1. A scheme of the ADAC architecture. 


The design loop consists of three components: (i) a generator of candidate 
designs, (ii) an evaluator of non-functional parameters of the candidate circuit 
(currently estimating the chip area), and (iii) a verifier evaluating the candidate 
error. The chip area and the error form a basis of the fitness function, whose 
value is minimised via our search strategy. In particular, the fitness is infinity 
if the circuit error exceeds the given threshold, and the chip area otherwise. In 
the future, we plan to support a more general specification of the fitness. As an 
additional feature, ADAC can also quantify the difference (in the given metric) 
between two given circuits. 
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The real values of non-functional parameters, such as the chip area or the 
power-delay product (PDP), depend on the target technology, and the synthesis 
of an optimal implementation of the given circuit using the target technology is 
highly time-consuming. Therefore, our design loop currently uses the chip area 
as the sole non-functional parameter. The chip area is estimated as the sum 
of the sizes of the gates of the circuit, which are given as one of the inputs of 
ADAC. The chip area is typically a good estimate of the power consumption [3, 
14,20,22]. The output of ADAC (in the gate-level Verilog format) can be passed 
to industrial circuit design tools to obtain accurate circuit parameters for the 
target technology. In our experiments, we report PDP for the 45 nm technology 
synthesised by the Synopsys Design Compiler [19]. 

We now briefly describe the candidate circuit generator and three methods 
for error evaluation that are currently supported in ADAC. 

'The candidate circuit generator is based on CGP where a candidate solution 
is encoded as a chromosome describing an oriented acyclic graph, given as a 2- 
dimensional array of 2-input nodes. Every node is numbered and is encoded by 
3 integers where the first two numbers denote the inputs and the third represents 
the function of the node. New candidate circuits are obtained using a mutation 
operator that performs random changes in the chromosome. The mutations can 
either modify the node interconnection or functionality. The area of candidate 
circuits is reduced by making some nodes unreachable (such nodes, however, are 
removed only at the very end, and so they can still be mutated and even become 
reachable again). The candidates are evaluated, and the one with the best one 
is used in the next iteration of the design loop. The whole loop starts with 
the golden circuit and iteratively generates approximate solutions with better 
fitness values until a termination criterion (typically a given time limit) is met. 
Optionally, user can provide approximate circuit satisfying the threshold on the 
error as a seed to start with. 

The bit-parallel circuit simulation supports all common error metrics, includ- 
ing the worst-case error (WCE), the mean error, the error rate representing the 
number of inputs leading to an incorrect output, and the Hamming distance. 
It utilises the power of modern processors by simulating the circuit on multiple 
inputs vectors (e.g. 64 inputs for 64-bit processors) in a single pass through the 
circuit [24]. However, despite the parallel processing that significantly accelerates 
the simulation, for circuits with arguments of larger bit-widths (beyond 12 bits), 
it is not feasible to simulate the circuits on all possible inputs, and so statistical 
guarantees on the approximation error are provided only. 

The BDD-based evaluation also supports all common error metrics, and, 
unlike simulation, it is able to provide formal error guarantees for circuits with 
larger input bit-widths. For the purpose of the evaluation, the original correct 
circuit and its approximation are interconnected into an auxiliary circuit called 
a miter such that the error can be deduced from its output (e.g. to compute the 
error rate, the outputs of the golden and candidate circuits are subtracted, and 
the result is compared with 0). The miter is encoded as a BDD on which the 
circuit error is evaluated using BDD operations [22,23]. However, this technique 
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does not scale well with the complexity of the circuits in terms of the number 
of their gates as the resulting BDD representation becomes prohibitively huge. 
Hence, this approach works well for large adders and similar circuits, but, it fails, 
e.g., for multipliers beyond 12-bits. 

The SAT-based evaluation currently supports WCE only, but it provides for- 
mal guarantees and a superior performance to the BDD-based technique. ADAC 
implements a novel miter construction based on subtracting the output of the 
golden and approximate circuit, followed by a comparison with the error thresh- 
old [3]. The construction is optimised for SAT-based evaluation by avoiding long 
XOR chains known to cause poor performance of state-of-the-art SAT solvers [5, 
9]. This allows us to exploit the ABC engine iprove, designed originally for miter- 
based exact circuit equivalence checking, to quickly evaluate WCE. 

The final ingredient of the design process is the search strategy. Apart from 
the standard evolutionary strategies based solely on the fitness function, ADAC 
also implements a novel verifiability-driven approach [3] combined with the SAT- 
based evaluation. 

The verifiability-driven search strategy uses a limit L on the resources avail- 
able to the underlying SAT decision procedure. The limit effectively controls the 
time the SAT solver can use. We require that every improving candidate has to 
be verifiable using the resource limit L. Therefore the strategy drives the search 
towards candidates that improve the fitness and can be promptly evaluated. As 
the result, we can evaluate in the given time a much larger set of candidate cir- 
cuits. Our experiments indicate that this strategy often leads to a higher number 
of improving solutions and thus finds circuits having a smaller chip area meeting 
the permissible error. On the other hand, it can happen that, for a limit L, no 
improving sequence exists, while it exists for a slightly greater resource limit. We 
are currently implementing auto-adaptive techniques that should automatically 
select the adequate resource limit for the given circuit. 


Integration to the ABC Tool. To make ADAC easily accessible, it is imple- 
mented as a new module for the ABC tool. ABC allows us to support an impor- 
tant subset of the Verilog specification and implementation language. We also 
utilize ABC to translate the circuits among different intermediate representa- 
tions used for constructing miters. As mentioned before, we employ the iprove 
engine in our SAT-based method for evaluating the WCE. Note that iprove uses 
MiniSat [18] as the SAT solver. Despite the fact that ABC supports a BDD-based 
circuit representation and manipulation, we implemented our own BDD compo- 
nent (based on the BuDDy library [2]) that is tailored for evolutionary circuit 
approximation. 


Extensibility. Due to its modular architecture, ADAC can be easily extended. 
Apart from the extensions mentioned above, we are working on a new component 
for error evaluation based on SAT counting methods (e.g. #SAT [4]) that could 
offer formal guarantees and a better scalability for the mean error and error-rate 
metrics, and on new candidate circuit generators counter-examples produced 
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during the verification of candidate circuits. In a long term perspective, we plan 
to generalise the underlying methods and support also design of approximate 
sequential circuits. 


3 Evaluation, Related Works, and Applications 


We first compare the performance of the different methods of circuit error eval- 
uation supported in ADAC. For that, we use results from adder approximation 
obtained from 10 runs, each for 5 min. The table in Fig. 2 shows average runtimes 
of a single error evaluation using the bit-parallel simulation, the BDD-based app- 
roach, and the SAT-based approach. The reported speedups are with respect to 
the simulation. We can see that the simulation provides the best performance for 
small bit-widths only, but it does not scale well The SAT-based method offers 
the best scalability and dominates for larger circuits, but it supports the WCE 
evaluation only. The BDD-based method, like simulation, supports all metrics 
and significantly outperforms the simulation for larger circuits. Note that, for 
more complex circuits such as multipliers, we would observe similar results with 
a worse relative performance of the BDD-based approach. 

'There indeed exist also other known methods for computing approximation 
errors for arithmetic circuits, including methods based on BDDs [6] or a SAT- 
based miter solution [5]. Comparing to ADAC, these methods are less scalable, 
which is demonstrated by the fact that they have been used for approximating 
multipliers limited to 8-bit operands and adders limited to 16-bit operands only. 
Apart from that, there are efficient methods for ezact equivalence checking based 
on algebraic computations [8,16]. However, they are so far not known for approx- 
imate equivalence checking. 


Bit-width of the arguments E 
w=6 w=10 w=14 s- 
Simulation 2104s 76 ms 31.23 s T el 
BDD cuce 3504s 12ms 0.38s PI 
speedup 0.59x 6.04x  80.74x 5 Ai 
BDD ceme |370 us 13ms 0.79s 
speedup |0.59x 5.72x 38.94x MN : i 
SAT cwce |920 us 1.4ms 1.7ms ad uw Worst mm [6] T i 
speedup [0.23% 537x 18468x pi) Selina vy DAC acta mar 
aaa MB »»» M4 eee M5 


Fig. 2. (Left) Performance of error evaluation methods for adders. (Right) A compari- 
son of 16-bit approximate multipliers designed by ADAC vs. the best known solutions. 


Next, we compare the quality of approximate circuits obtained using ADAC 
with circuits that appeared in the literature. We consider 16-bit multipliers 
since existing approaches are not able to handle larger and more complex cir- 
cuits. The different points in Fig.2 correspond to circuits with different trade- 
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offs between WCE in % and the power-delay product (PDP?), which is a key 
non-functional circuit characteristic. These circuits were obtained using vari- 
ous existing approaches including: (M1) configurable circuits from the IpACLib 
library [17], (M2) the bit-significance-driven logic compression [15], (M3) the 
bit-width truncation [10], (M4) compositional techniques [11], and (M5) circuits 
from the EvoApproxLib library [13]. We can see that just the bit-width trun- 
cation can provide a quality of results comparable with ADAC (in terms of the 
PDP reduction for the given WCE), but for large target errors (20% WCE or 
more) only. For small target errors, ADAC clearly dominates. 

Note that, for each target WCE, we performed 30 independent runs of CGP 
to obtain statistically significant results. For each run, ADAC was executed for 
2hon an Intel Xeon X5670 2.4 GHz processor using a single core. Also note that 
the individual runs are independent and thus can be easily parallelised. 

Further, Fig.3 presents approx- ioo 
imate multipliers up to 32 bits 


obtained by ADAC. It shows Pareto d: I ee 
fronts representing circuits with dif- J 60%! | —— 16-bit 
ferent compromises between WCE in & =i 
% and PDP, and demonstrates that © 40% i Pp 
ADAC goes beyond capabilities of do 32-bit 


existing methods and tools. For each 
target WCE, ADAC was executed a iso iy Hep MEER ER 
for 4 hours in the case of the 24-bit Worst case error [6] 
instances and for 6 hours in the case 


of the larger instances. Note that a Fig. 3. Approximate multipliers designed by 
ADAC. 10096 refers to PDP of the accurate 


circuits for the given bit-width. 


32-bit exact multiplier requires over 
6,300 gates, and, to the best of our 
knowledge, ADAC is the first tool that is able to approximate such complex 
circuits with formal error guarantees. 

Besides the approaches mentioned above, there also exist general-purpose 
methods, such as SALSA [14] or SASIMI [15], approximating circuits indepen- 
dently of their structure. We were unable to perform a direct comparison with 
them due to their implementation is not available, but based on the published 
results, ADAC is able to provide a significantly better scalability. 


Practical Impacts. The following list briefly characterises several resource- 
aware applications that build on approximate circuits. The circuits were obtained 
using prototype implementations of the above mentioned approaches that are 
now integrated in ADAC. 


Approximate multipliers for convolutional neural networks [14]. In such net- 
works, millions of multiplications have to be performed. The usage of application- 
specific approximate multipliers led to 90% savings in terms of power consump- 
tion of the data path for a negligible drop in classification accuracy. 


? PDP characterises both the speed and energy efficiency of the circuit. 
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Approximate Adders and Subtractors for a Discrete Convolutional Transforma- 
tion [22]. These adders and subtractors were designed to reduce the power con- 
sumption in video compression for the High Efficiency Video Coding (HEVC) 
standard. They show better quality /power trade-offs than implementations avail- 
able in the literature. For example, a 25% power reduction for the same error 
was obtained in comparison with a recent highly-optimised implementation. 


Approximate Adders and Multipliers for Image Processing |20]. These circuits 
were used in the development of efficient hardware implementations of filters and 
edge detectors. A 50% reduction was observed in the number of look-up tables 
used in a field programmable gate array for a negligible drop in the image visual 
quality. 
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Abstract. Simple stochastic games can be solved by value iteration 
(VI), which yields a sequence of under-approximations of the value of 
the game. This sequence is guaranteed to converge to the value only in 
the limit. Since no stopping criterion is known, this technique does not 
provide any guarantees on its results. We provide the first stopping cri- 
terion for VI on simple stochastic games. It is achieved by additionally 
computing a convergent sequence of over-approximations of the value, 
relying on an analysis of the game graph. Consequently, VI becomes an 
anytime algorithm returning the approximation of the value and the cur- 
rent error bound. As another consequence, we can provide a simulation- 
based asynchronous VI algorithm, which yields the same guarantees, but 
without necessarily exploring the whole game graph. 


1 Introduction 


Simple Stochastic Game. (SG) [Con92] is a zero-sum two-player game played 
on a graph by Maximizer and Minimizer, who choose actions in their respective 
vertices (also called states). Each action is associated with a probability distri- 
bution determining the next state to move to. The objective of Maximizer is 
to maximize the probability of reaching a given target state; the objective of 
Minimizer is the opposite. 

Stochastic games constitute a fundamental problem for several reasons. From 
the theoretical point of view, the complexity of this problem! is known to be 
in UP NcoUP [HK66], but no polynomial-time algorithm is known. Further, 
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TUM - IAS, the Studienstiftung des deutschen Volkes project “Formal methods for 
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111938, TUM IGSSE Grant 10.06 (PARSEC), and the German Research Foundation 
(DFG) project KR 4890/2-1 “Statistical Unbounded Verification". 

1 Formally, the problem is to decide, for a given p € [0,1] whether Maximizer has a 
strategy ensuring probability at least p to reach the target. 
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several other important problems can be reduced to SG, for instance parity 
games, mean-payoff games, discounted-payoff games and their stochastic exten- 
sions [CF 11]. The task of solving SG is also polynomial-time equivalent to solv- 
ing perfect information Shapley, Everett and Gillette games [AM09]. Besides, 
the problem is practically relevant in verification and synthesis. SG can model 
reactive systems, with players corresponding to the controller of the system and 
to its environment, where quantified uncertainty is explicitly modelled. This is 
useful in many application domains, ranging from smart energy management 
[CFK--13a] to autonomous urban driving [CKSW13], robot motion planning 
[LaV00] to self-adaptive systems [CMG14]; for various recent case studies, see 
e.g. [SK 16]. Finally, since Markov decision processes (MDP) [Put14] are a special 
case with only one player, SG can serve as abstractions of large MDP [KKNP10]. 


Solution Techniques. There are several classes of algorithms for solving SG, 
most importantly strategy iteration (SI) algorithms [HK66] and value iteration 
(VI) algorithms [Con92]. Since the repetitive evaluation of strategies in SI is 
often slow in practice, VI is usually preferred, similarly to the special case of 
MDPs [KM17]. For instance, the most used probabilistic model checker PRISM 
[KNP11] and its branch PRISM-Games [CFK+13a] use VI for MDP and SG 
as the default option, respectively. However, while SI is in principle a precise 
method, VI is an approximative method, which converges only in the limit. 
Unfortunately, there is no known stopping criterion for VI applied to SG. Conse- 
quently, there are no guarantees on the results returned in finite time. Therefore, 
current tools stop when the difference between the two most recent approxima- 
tions is low, and thus may return arbitrarily imprecise results [HM17]. 


Value Iteration with Guarantees. In the special case of MDP, in order to 
obtain bounds on the imprecision of the result, one can employ a bounded variant 
of VI [MLG05, BCC+14] (also called interval iteration [HM17]). Here one com- 
putes not only an under-approximation, but also an over-approximation of the 
actual value as follows. On the one hand, iterative computation of the least fix- 
point of Bellman equations yields an under-approximating sequence converging 
to the value. On the other hand, iterative computation of the greatest fixpoint 
yields an over-approximation, which, however, does not converge to the value. 
Moreover, it often results in the trivial bound of 1. A solution suggested for 
MDPs [BCC+14,HM17] is to modify the underlying graph, namely to collapse 
end components. In the resulting MDP there is only one fixpoint, thus the least 
and greatest fixpoint coincide and both approximating sequences converge to 
the actual value. In contrast, for general SG no procedure where the greatest 
fixpoint converges to the value is known. In this paper we provide one, yielding 
a stopping criterion. We show that the pre-processing approach of collapsing is 
not applicable in general and provide a solution on the original graph. We also 
characterize SG where the fixpoints coincide and no processing is needed. The 
main technical challenge is that states in an end component in SG can have 
different values, in contrast to the case of MDP. 
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Practical Efficiency Using Guarantees. We further utilize the obtained 
guarantees to practically improve our algorithm. Similar to the MDP 
case [BCC--14], the quantification of the error allows for ignoring parts of the 
state space, and thus a speed up without jeopardizing the correctness of the 
result. Indeed, we provide a technique where some states are not explored and 
processed at all, but their potential effect is still taken into account The informa- 
tion is further used to decide the states to be explored next and to be analyzed 
in more detail. To this end, simulations and learning are used as tools. While 
for MDP this idea has already demonstrated speed ups in orders of magnitude 
[BCC--14, ACD+17], this paper provides the first technique of this kind for SG. 
Our contribution is summarized as follows 


— We introduce a VI algorithm yielding both under- and over-approximation 
sequences, both of which converge to the value of the game. Thus we present 
the first stopping criterion for VI on SG and the first anytime algorithm 
with guaranteed precision. We also characterize when a simpler solution is 
sufficient. 

— We provide a learning-based algorithm, which preserves the guarantees, but 
is in some cases more efficient since it avoids exploring the whole state space. 

— We evaluate the running times of the algorithms experimentally, concluding 
that obtaining guarantees requires an overhead that is either negligible or 
mitigated by the learning-based approach. 


Related Work. The works closest to ours are the following. As mentioned 
above, [BCC--14, HM17] describe the solution to the special case of MDP. While 
[BCC4-14] also provides a learning-based algorithm, [HM17] discusses the con- 
vergence rate and the exact solution. The basic algorithm of [HM17] is imple- 
mented in PRISM [BKL+17] and the learning approach of [BCC+14] in STORM 
[DJKV17a]. The extension for SG where the interleaving of players is severely 
limited (every end component belongs to one player only) is discussed in [Ujm15]. 

Further, in the area of probabilistic planning, bounded real-time dynamic 
programming [MLGO05] is related to our learning-based approach. However, it 
is limited to the setting of stopping MDP where the target sink or the non- 
target sink is reached almost surely under any pair of strategies and thus the 
fixpoints coincide. Our algorithm works for general SG, not only for stopping 
ones, without any blowup. 

For SG, the tools implementing the standard SI and/or VI algorithms are 
PRISM-games [CFK+13a], GAVS4- [CKLB11] and GIST [CHJR10]. The latter 
two are, however, neither maintained nor accessible via the links provided in 
their publications any more. 

Apart from fundamental algorithms to solve SG, there are various practically 
efficient heuristics that, however, provide none or weak guarantees, often based 
on some form of learning [BT00, LL08, WT16, TT16, AY17, BBSOS]. Finally, the 
only currently available way to obtain any guarantees through VI is to perform 
q? iterations and then round to the nearest multiple of 1/7, yielding the value 
of the game with precision 1/y [CH08]; here y cannot be freely chosen, but it 
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is a fixed number, exponential in the number of states and the used probability 
denominators. However, since the precision cannot be chosen and the number of 
iterations is always exponential, this approach is infeasible even for small games. 


Organization of the Paper. Section 2 introduces the basic notions and revises 
value iteration. Section 3 explains the idea of our approach on an example. 
Section 4 provides a full technical treatment of the method as well as the learning- 
based variation. Section 5 discusses experimental results and Sect. 6 concludes. 
The appendix (available in [KKKW18]) gives technical details on the pseudocode 
as well as the conducted experiments and provides more extensive proofs to the 
theorems and lemmata; in this paper, there are only proof sketches and ideas. 


2 Preliminaries 


2.1 Basic Definitions 


A probability distribution on a finite set X is a mapping ô : X — [0,1], such 
that > „ex Óó(x) = 1. The set of all probability distributions on X is denoted 
by D(X). Now we define stochastic games, in literature often referred as simple 
stochastic games or stochastic two-player games with a reachability objective. 


Definition 1 (SG). A stochastic game (SG) is a tuple (S,S , So: So A, 
Av,ô,1,0), where S is a finite set of states partitioned into the sets S4 and 
So of states of the player Maximizer and Minimizer, respectively, s),1,0 € S 
is the initial state, target state, and sink state, respectively, A is a finite set 
of actions, Av : S — 2^ assigns to every state a set of available actions, and 
ô: S x A — D(S) is a transition function that given a state s and an action 
a € Av(s) yields a probability distribution over successor states. 
A Markov decision process (MDP) is a special case of SG where So = 9. 


We assume that SGs are non-blocking, so for all states s we have Av(s) Æ 0. 
Further, 1 and o only have one action and it is a self-loop with probability 1. 
Additionally, we can assume that the SG is preprocessed so that all states with 
no path to 1 are merged with o. 

For a state s and an available action a € Av(s), we denote the set of successors 
by Post(s, a) :— (s' | 0(s, a, s") > 0). Finally, for any set of states T C S, we use 
Tt; and To to denote the states in T that belong to Maximizer and Minimizer, 
whose states are drawn in the figures as O and Q, respectively. 

'The semantics of SG is given in the usual way by means of strategies and the 
induced Markov chain and the respective probability space, as follows. An infi- 
nite path p is an infinite sequence p = $9a9s,a, ::: € (S x A)", such that for every 
i € N, a; € Av(s;) and s;,, € Post(s; a;). Finite paths are defined analogously as 
elements of (S x A)* x S. Since this paper deals with the reachability objective, 
we can restrict our attention to memoryless strategies, which are optimal for this 
objective. We still allow randomizing strategies, because they are needed for the 
learning-based algorithm later on. A strategy of Maximizer or Minimizer is a 
function o : Sg — D(A) or So — D(A), respectively, such that o(s) € D(Av(s)) 
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for all s. We call a strategy deterministic if it maps to Dirac distributions only. 
Note that there are finitely many deterministic strategies. A pair (c, T) of strate- 
gies of Maximizer and Minimizer induces a Markov chain G^" where the transi- 
tion probabilities are defined as 4(s,s’) = 5» /,cA,(.) 7(S, a): 0(s, a, s') for states of 
Maximizer and analogously for states of Minimizer, with ø replaced by 7. The 
Markov chain induces a unique probability distribution P27 over measurable 
sets of infinite paths [BK08, Chap. 10]. 

We write O1 := (p | Ji € N. p(i) = 1) to denote the (measurable) set of all 
paths which eventually reach 1. For each s € S, we define the value in s as 


V(s) :2 supinf P7" (01) = inf sup?" (01), 


where the equality follows from [Mar75]. We are interested not only in V (sọ), 
but also its c-approximations and the corresponding (¢-)optimal strategies for 
both players. 

Now we recall a fundamental tool for analysis of MDP called end components. 
We introduce the following notation. Given a set of states T C S, a states € T 
and an action a € Av(s), we say that (s,a) exitsT if Post(s, a) Z T. We define 
an end component of a SG as the end component of the underlying MDP with 
both players unified. 


Definition 2 (EC). A non-empty set T C S of states is an end component 


(EC) if there is a non-empty set B C U er Av(s) of actions such that 


1. for each s € T,a € BN Av(s) we do not have (s, a) exits T, 
2. for each s,s' € T there is a finite path w = sag...a,s' € (T x B)* x T, i.e. 
the path stays inside T and only uses actions in B. 


Intuitively, ECs correspond to bottom strongly connected components of the 
Markov chains induced by possible strategies, so for some pair of strategies all 
possible paths starting in the EC remain there. An end component T is a maximal 
end component (MEC) if there is no other end component T” such that T C T". 
Given an SG G, the set of its MECs is denoted by MEC(G) and can be computed 
in polynomial time [CY95]. 


2.2 (Bounded) Value Iteration 


The value function V satisfies the following system of equations, which is referred 
to as the Bellman equations: 


maXacav(s) V(s,a) ifs eS, 


i » : 
V(s) = | > eG) (a) dis mas 
1 ifs=1 


0 ifs=o 
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where? 


V(s,a):— 5 0(s,a,s) - V(s') (2) 


Moreover, V is the least solution to the Bellman equations, see e.g. [CH08]. 
To compute the value of V for all states in an SG, one can thus utilize the 
iterative approximation method value iteration (VI) as follows. We start with a 
lower bound function Ly: S — [0,1] such that Lo (1) = 1 and, for all other s € S, 
Lo(s) = 0. Then we repetitively apply Bellman updates (3) and (4) 


L,(s,a) :— 5 d(s,a,s')-L,_1(s’) (3) 


s'es 
more max,zeav(s)L,(s,a) ifs 5 4 
i i minaeAv(s) Lats, a) ifsc So 


until convergence. Note that convergence may happen only in the limit even for 
such a simple game as in Fig.1 on the left. The sequence is monotonic, at all 
times a lower bound on V, i.e. L;(s) € V(s) for all s € S, and the least fixpoint 
satisfies L* := lim,n..oL, = V. 

Unfortunately, there is no known stopping criterion, i.e. no guarantees how 
close the current under-approximation is to the value [HM17]. The current tools 
stop when the difference between two successive approximations is smaller than 
a certain threshold, which can lead to arbitrarily wrong results [HM17]. 

For the special case of MDP, it has been suggested to also compute the 
greatest fixpoint [MLG05] and thus an upper bound as follows. The function 
G : S — [0, 1] is initialized for all states s € S as Go(s) = 1 except for Go(o) = 0. 
Then we repetitively apply updates (3) and (4), where L is replaced by G. The 
resulting sequence G,, is monotonic, provides an upper bound on V and the 
greatest fixpoint G* :— lim, G, is the greatest solution to the Bellman equations 
on [0, 1]5. 

This approach is called bounded value iteration (BVI) (or bounded real- 
time dynamic programming (BRTDP) [MLG05,BCC--14] or interval iteration 
[HM17]). If L* = G* then they are both equal to V and we say that BVI con- 
verges. BVI is guaranteed to converge in MDP if the only ECs are those of 
i and o [BCC+14]. Otherwise, if there are non-trivial ECs they have to be 
*collapsed"?. Computing the greatest fixpoint on the modified MDP results in 
another sequence U; of upper bounds on V, converging to U* :— lim, U,,. Then 
BVI converges even for general MDPs, U* = V [BCC+14], when transformed 
this way. The next section illustrates this difficulty and the solution through 
collapsing on an example. 


? Throughout the paper, for any function f : S > [0, 1] we overload the notation and 
also write f(s,a) meaning *5,.50(s,a,s') - f(s’). 

3 All states of an EC are merged into one, all leaving actions are preserved and all 
other actions are discarded. For more detail see [KKKW18, Appendix A.1]. 
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In summary, all versions of BVI discussed so far and later on in the paper 
follow the pattern of Algorithm 1. In the naive version, UPDATE just performs 
the Bellman update on L and U according to Eqs. (3) and (4).* For a general 
MDP, U does not converge to V, but to G*, and thus the termination criterion 
may never be met if G*(s,) — V(sg) > 0. If the ECs are collapsed in pre-processing 
then U converges to V. 

For the general case of SG, the collapsing approach fails and this paper pro- 
vides another version of BVI where U converges to V, based on a more detailed 
structural analysis of the game. 


Algorithm 1. Bounded value iteration algorithm 


1: procedure BVI (precision e > 0) 
2: forse S do  \* Initialization * V 


3: L(s) 20 \* Lower bound * V 

4: U(s) 21 \* Upper bound * V 

5: L(1)21 V* Value of sinks is determined a priori * V 

6: U(o) 20 

T: repeat 

8: UPDATE(L, U) \* Bellman updates or their modification * \ 
9: until U(so) — L(sy) «e . V* Guaranteed error bound * \ 


3 Example 


In this section, we illustrate the issues preventing BVI convergence and our 
solution on a few examples. Recall that G is the sequence converging to the 
greatest solution of the Bellman equations, while U is in general any sequence 
over-approximating V that one or another BVI algorithm suggests. 

Firstly, we illustrate the issue that arises already for the special case of MDP. 
Consider the MPD of Fig.1 on the left. Although V(s) = V(t) = 0.5, we have 
G;(s) = G,(t) = 1 for all i. Indeed, the upper bound for t is always updated 
as the maximum of G,(t,c) and G,(t,b). Although G,(t,c) decreases over time, 
G,(t, b) remains the same, namely equal to G,(s), which in turn remains equal to 
G,(s, a) = G,(t). This cyclic dependency lets both s and t remain in an “illusion” 
that the value of the other one is 1. 

'The solution for MDP is to remove this cyclic dependency by collapsing all 
MECs into singletons and removing the resulting purely self-looping actions. 
Figure 1 in the middle shows the MDP after collapsing the EC {s, t}. This turns 
the MDP into a stopping one, where 1 or o is under any strategy reached with 
probability 1. In such MDP, there is a unique solution to the Bellman equations. 
Therefore, the greatest fixpoint is equal to the least one and thus to V. 


^ For the straightforward pseudocode, see [KKKW18, Appendix A.2]. 
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Secondly, we illustrate the issues that additionally arise for general SG. It 
turns out that the collapsing approach can be extended only to games where 
all states of each EC belong to one player only [Ujm15]. In this case, both 
Maximizer’s and Minimizer’s ECs are collapsed the same way as in MDP. 

However, when both players are present in an EC, then collapsing may not 
solve the issue. Consider the SG of Fig. 2. Here a and 0 represent the values of 
the respective actions.” There are three cases: 

First, let a < 8. If the bounds converge to these values we eventually observe 
G;(q,e) < L;(r, f) and learn the induced inequality. Since p is a Minimizer’s state 
it will never pick the action leading to the greater value of 8. Therefore, we can 
safely merge p and q, and remove the action leading to r, as shown in the second 
subfigure. 

Second, if a > 8, p and r can be merged in an analogous way, as shown in 
the third subfigure. 

Third, if œ = 8, both previous solutions as well as collapsing all three states 
as in the fourth subfigure is possible. However, since the approximants may only 
converge to a and f in the limit, we may not know in finite time which of these 
cases applies and thus cannot decide for any of the collapses. 

Consequently, the approach of collapsing is not applicable in general. In order 
to ensure BVI convergence, we suggest a different method, which we call deflat- 
ing. It does not involve changing the state space, but rather decreasing the upper 
bound U; to the least value that is currently provable (and thus still correct). To 
this end, we analyze the exiting actions, i.e. with successors outside of the EC, 
for the following reason. If the play stays in the EC forever, the target is never 
reached and Minimizer wins. Therefore, Maximizer needs to pick some exiting 
action to avoid staying in the EC. 


d 
= Q Heepet 
$8 1|]^[| jo] 0 1 
ES a Ee 
>E ist ae y 
9 9 
b 3 3| 13 14 
L 27 27 


Fig. 1. Left: An MDP (as special case of SG) where BVI does not converge due to the 
grayed EC. Middle: The same MDP where the EC is collapsed, making BVI converge. 
Right: The approximations illustrating the convergence of the MDP in the middle. 


5 Precisely, we consider them to stand for a probabilistic branching with probability 
a (or B) to 1 and with the remaining probability to o. To avoid clutter in the figure, 
we omit this branching and depict only the value. 
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Fig. 2. Left: Collapsing ECs in SG may lead to incorrect results. The Greek letters on 
the leaving arrows denote the values of the exiting actions. Right three figures: Correct 
collapsing in different cases, depending on the relationship of a and f. In contrast to 
MDP, some actions of the EC exiting the collapsed part have to be removed. 


For the EC with the states s and t in Fig. 1, the only exiting action is c. In 
this example, since c is the only exiting action, U;(t,c) is the highest possible 
upper bound that the EC can achieve. Thus, by decreasing the upper bound of 
all states in the EC to that number, we still have a safe upper bound. Moreover, 
with this modification BVI converges in this example, intuitively because now 
the upper bound of t depends on action c as it should. 

For the example in Fig.2, it is correct to decrease the upper bound to the 
maximal exiting one, i.e. max(à, 3}, where â := U;(a), 8 :— U,(b) are the cur- 
rent approximations of a and of 3. However, this itself does not ensure BVI 
convergence. Indeed, if for instance â < B then deflating all states to B is not 
tight enough, as values of p and q can even be bounded by à. In fact, we have 
to find a certain sub-EC that corresponds to â, in this case {p,q} and set all its 
upper bounds to â. We define and compute these sub-ECs in the next section. 

In summary, the general structure of our convergent BVI algorithm is to 
produce the sequence U by application of Bellman updates and occasionally find 
the relevant sub-ECs and deflate them. The main technical challenge is that 
states in an EC in SG can have different values, in contrast to the case of MDP. 


4 Convergent Over-Approximation 


In Sect. 4.1, we characterize SGs where Bellman equations have more solutions. 
Based on the analysis, subsequent sections show how to alter the procedure 
computing the sequence G; over-approximating V so that the resulting tighter 
sequence U, still over-approximates V, but also converges to V. This ensures that 
thus modified BVI converges. Section 4.4 presents the learning-based variant of 
our BVI. 


$ We choose the name “deflating” to evoke decreasing the overly high “pressure” in 
the EC until it equalizes with the actual “pressure” outside. 
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4.1 Bloated End Components Cause Non-convergence 


As we have seen in the example of Fig. 2, BVI generally does not converge due to 
ECs with a particular structure of the exiting actions. The analysis of ECs relies 
on the extremal values that can be achieved by exiting actions (in the example, 
a and 8). Given the value function V or just its current over-approximation U,, 
we define the most profitable exiting action for Maximizer (denoted by L1) and 
Minimizer (denoted by ©) as follows. 


Definition 3 (bestExit). Given a set of states T C S and a function f : S > 
[0,1] (see footnote 2), the f-value of the best T-exiting action of Maximizer and 
Minimizer, respectively, is defined as 


0 
bestExit; (T) = m f(s,a) 
(s,a) exits T 


bestExit? (T) = mi 
estExitz (T) n f(s,a) 
(s,a) exits T 


with the convention that maxg — 0 and ming = 1. 


Example 1. In the example of Fig.2 on the left with T = {p,q,r} anda < f, 
we have bestExity (T) = 2, bestExit? (T) = 1. It is due to 8 < 1 that BVI does 
not converge here. We generalize this in the following lemma. A 


Lemma 1. Let T be an EC. For every m satisfying bestExity(T) < m < 


bestExit9 (T), there is a solution f: S — [0,1] to the Bellman equations, which 
on T is constant and equal to m. 


Proof (Idea). Intuitively, such a constant m is a solution to the Bellman equa- 
tions on T for the following reasons. As both players prefer getting m to exiting 
and getting “only” the values of their respective bestExit, they both choose to 
stay in the EC (and the extrema in the Bellman equations are realized on non- 
exiting actions). On the one hand, Maximizer (Bellman equations with max) 
is hoping for the promised m, which is however not backed up by any actions 
actually exiting towards the target. On the other hand, Minimizer (Bellman 
equations with min) does not realize that staying forever results in her optimal 
value 0 instead of m. 


Corollary 1. If bestExit? (T) > bestExity(T) for some EC T, then G* 2 V. 


Proof. Since there are mi,mo such that bestExity(T) < mi < m < 


bestExit? (T), by Lemma 1 there are two different solutions to the Bellman equa- 
tions. In particular, G* > L* = V, and BVI does not converge. 


In accordance with our intuition that ECs satisfying the above inequality 
should be deflated, we call them bloated. 
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Definition 4 (BEC). An EC T is called a bloated end component (BEC), if 
bestExit? (T) > bestExit7 (T). 


Example 2. In the example of Fig. 2 on the left with a < 8, the ECs {p,q} and 
{p,q,r} are BECs. A 


Example 3. If an EC T has no exiting actions of Minimizer (or no Minimizer’s 
states at all, as in an MDP), then bestExit (T) = 1 (the case with ming). Hence 


all numbers between bestExity (T) and 1 are a solution to the Bellman equations 
and G*(s) — 1 for all states s € T. 

Analogously, if Maximizer does not have any exiting action in T', then it 
holds that bestExity (T) = 0 (the case with maxg), T is a BEC and all numbers 


between 0 and bestExit? (T) are a solution to the Bellman equations. 

Note that in MDP all ECs belong to one player, namely Maximizer. Conse- 
quently, all ECs are BECs except for ECs where Maximizer has an exiting action 
with value 1; all other ECs thus have to be collapsed (or deflated) to ensure BVI 
convergence in MDPs. Interestingly, all non-trivial ECs in MDPs are a problem, 
while in SGs through the presence of the other player some ECs can converge, 
namely if both players want to exit (See e.g. [KKKW18, Appendix A.3]). A 


We show that BECs are indeed the only obstacle for BVI convergence. 
Theorem 1. Ifthe SG contains no BECs except for (o) and {1}, then G* = V. 


Proof (Sketch). Assume, towards a contradiction, that there is some state s 
with a positive difference G*(s) — V(s) > 0. Consider the set D of states with 
the maximal difference. D can be shown to be an EC. Since it is not a BEC 
there has to be an action exiting D and realizing the optimum in that state. 
Consequently, this action also has the maximal difference, and all its successors, 
too. Since some of the successors are outside of D, we get a contradiction with 
the maximality of D. 


In Sect. 4.2, we show how to eliminate BECs by collapsing their “core” parts, 
called below MSECs (maximal simple end components). Since MSECs can only 
be identified with enough information about V, Sect.4.3 shows how to avoid 
direct a priori collapsing and instead dynamically deflate candidates for MSECs 
in a conservative way. 


4.2 Static MSEC Decomposition 


Now we turn our attention to SG with BECs. Intuitively, since in a BEC all Min- 
imizer’s exiting actions have a higher value than what Maximizer can achieve, 
Minimizer does not want to use any of his own exiting actions and prefers stay- 
ing in the EC (or steering Maximizer towards his worse exiting actions). Con- 
sequently, only Maximizer wants to take an exiting action. In the MDP case he 
can pick any desirable one. Indeed, he can wait until he reaches a state where 
it is available. As a result, in MDP all states of an EC have the same value 
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and can all be collapsed into one state. In the SG case, he may be restricted 
by Minimizer’s behaviour or even not given any chance to exit the EC at all. 
As a result, a BEC may contain several parts (below denoted MSECs), each 
with different value, intuitively corresponding to different exits. Thus instead of 
MECs, we have to decompose into finer MSECs and only collapse these. 


Definition 5 (Simple EC). An EC T is called simple (SEC), if for all s € T 
we have V(s) = bestExity (T). 
A SEC C is maximal (MSEC) if there is no SEC C' such that C C C'. 


Intuitively, an EC is simple, if Minimizer cannot keep Maximizer away from 
his bestExit. Independently of Minimizer's decisions, Maximizer can reach the 
bestExit almost surely, unless Minimizer decides to leave, in which case Maxi- 
mizer could achieve an even higher value. 


Example 4. Assume a < B in the example of Fig. 2. Then {p,q} is a SEC and an 
MSEC. Further observe that action c is sub-optimal for Minimizer and removing 
it does not affect the value of any state, but simplifies the graph structure. 
Namely, it destructs the whole EC into several (here only one) SECs and some 
non-EC states (here r). A 


Algorithm 2, called FIND. MSEC, shows how to compute MSECs. It returns 
the set of all MSECs if called with parameter V. However, later we also call this 
function with other parameters f : S — [0,1]. The idea of the algorithm is the 
following. The set X consists of Minimizer's sub-optimal actions, leading to a 
higher value. As such they cannot be a part of any SEC and thus should be 
ignored when identifying SECs. (The previous example illustrates that ignoring 
X is indeed safe as it does not change the value of the game.) We denote the 
game G where the available actions Av are changed to the new available actions 
Av' (ignoring the Minimizer's sub-optimal ones) as Gray av); Once removed, 
Minimizer has no choices to affect the value and thus each EC is simple. 


Algorithm 2. FIND. MSEC 
1: function FIND. MSEC(/f : S — [0,1] 


2: X e {(s,{a € Av(s) | f(s,a) > F(s)}) |s € So} 
3: Av’ — Av\ X \* Minimizer’s f-suboptimal actions removed * \ 
4: return MEC(Giay savy) \* MEC(Gray /4,5) are MSECs of the original G * V 


Lemma 2 (Correctness of Algorithm 2). T € FIND-MSEC(V) if and only 
if T is a MSEC. 


Proof (Sketch). "If": As T is an MSEC, all states in T have the value 
bestExity (T), and hence also all actions that stay inside T have this value. 
Thus, no action that stays in T' is removed by Line 3 and it is still a MEC 
in the modified game. 
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“Only if”: If T € FIND MSEC(V), then T is a MEC of the game where 
the suboptimal available actions (those in X) of Minimizer have been removed. 
Hence for all s € T : V(s) = bestExit; (T), because intuitively Minimizer has 
no possibility to influence the value any further, since all actions that could do 
so were in X and have been removed. Since T is a MEC in the modified game, 
it certainly is an EC in the original game. Hence T is a SEC. The inclusion 
maximality follows from the fact that we compute MECs in the modified game. 
'Thus T' is an MSEC. 


Remark 1 (Algorithm with an oracle). In Sect.3, we have seen that collapsing 
MECs does not ensure BVI convergence. Collapsing does not preserve the values, 
since in BECs we would be collapsing states with different values. Hence we want 
to collapse only MSECs, where the values are the same. If, moreover, we remove 
X in such a collapsed SG, then there are no (non-sink) ECs and BVI converges 
on this SG to the original value. 


The difficulty with this algorithm is that it requires an oracle to compare 
values, for instance a sufficiently precise approximation of V. Consequently, we 
cannot pre-compute the MSECs, but have to find them while running BVI. 
Moreover, since the approximations converge only in the limit we may never be 
able to conclude on simplicity of some ECs. For instance, if a = 0 in Fig. 2, 
and if the approximations converge at different speeds, then Algorithm 2 always 
outputs only a part of the EC, although the whole EC on {p,q,r} is simple. 

In MDPs, all ECs are simple, because there is no second player to be resolved 
and all states in an EC have the same value. Thus for MDPs it suffices to collapse 
all MECs, in contrast to SG. 


4.3 Dynamic MSEC Decomposition 


Since MSECs cannot be identified from approximants of V for sure, we refrain 
from collapsing’ and instead only decrease the over-approximation in the corre- 
sponding way. We call the method deflating, by which we mean decreasing the 
upper bound of all states in an EC to its bestExity, see Algorithm 3. The pro- 
cedure DEFLATE (called on the current upper bound U;) decreases this upper 
bound to the minimum possible value according to the current approximation 
and thus prevents states from only depending on each other, as in SECs. Intu- 
itively, it gradually approximates SECs and performs the corresponding adjust- 
ments, but does not commit to any of the approximations. 


Algorithm 3. DEFLATE 


1: function DEFLATE(EC T, f : S — [0,1]) 
2: for s € T do 
3: f(s) — min(f (s), bestExitz (T)) \* Decrease the upper bound * V 


4: return f 


T Our subsequent method can be combined with local collapsing whenever the lower 
and upper bounds on V are conclusive. 
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Lemma 3 (DEFLATE is sound). For any f : S — [0,1] such that f > V and 
any EC T, DEFLATE(T, f) > V. 


'This allows us to define our BVI algorithm as the naive BVI with only the 
additional lines 3-4, see Algorithm 4. 


Algorithm 4. UPDATE procedure for bounded value iteration on SG 
1: procedure UPDATE(L : S — [0,1], U : S — [0,1]) 
2: L,U get updated according to Eq. (3) and (4) \* Bellman updates * \ 


3: for T € FIND-MSEC(L) do \* Use lower bound to find ECs * \ 
4: U — DEFLATE(T, U) \* and deflate the upper bound there * \ 


Theorem 2 (Soundness and completeness). Algorithm 1 (calling Algo- 
rithm 4) produces monotonic sequences L under- and U over-approximating V, 
and terminates. 


Proof (Sketch). The crux is to show that U converges to V. We assume towards 
a contradiction, that there exists a state s with limp. U,,(s) — V(s) > 0. Then 
there exists a nonempty set of states X where the difference between lim; ,.5 Un 
and V is maximal. If the upper bound of states in X depends on states outside of 
X, this yields a contradiction, because then the difference between upper bound 
and value would decrease in the next Bellman update. So X must be an EC where 
all states depend on each other. However, if that is the case, calling DEFLATE 
decreases the upper bound to something depending on the states outside of X, 
thus also yielding a contradiction. 


Summary of Our Approach: 


1. We cannot collapse MECs, because we cannot collapse BECs with non- 
constant values. 

2. If we remove X (the sub-optimal actions of Minimizer) we can collapse MECs 
(now actually MSECSs with constant values). 

3. Since we know neither X nor SECs we gradually deflate SEC approximations. 


4.4 Learning-Based Algorithm 


Asynchronous value iteration selects in each round a subset T C S of states 
and performs the Bellman update in that round only on T'. Consequently, it 
may speed up computation if “important” states are selected. However, using 
the standard VI it is even more difficult to determine the current error bound. 
Moreover, if some states are not selected infinitely often the lower bound may 
not even converge. 

In the setting of bounded value iteration, the current error bound is known 
for each state and thus convergence can easily be enforced. This gave rise to 
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asynchronous VI, such as BRTDP (bounded real time dynamic programing) in 
the setting of stopping MDPs [MLG05], where the states are selected as those 
that appear on a simulation run. Very similar is the adaptation for general MDP 
[BCC4-14]. In order to simulate a run, the transition probabilities determine how 
to resolve the probabilistic choice. In order to resolve the non-deterministic choice 
of Maximizer, the “most promising action" is taken, i.e., with the highest U. This 
choice is derived from a reinforcement algorithm called delayed Q-learning and 
ensures convergence while practically performing well [BCC+14]. 

In this section, we harvest our convergence results and BVI algorithm for SG, 
which allow us to trivially extend the asynchronous learning-based approach of 
BRTDP to SGs. On the one hand, the only difference to the MDP algorithm 
is how to resolve the choice for Minimizer. Since the situation is dual, we again 
pick the *most promising action", in this case with the lowest L. On the other 
hand, the only difference to Algorithm 1 calling Algorithm 4 is that the Bellman 
updates of U and L are performed on the states of the simulation run only, see 
lines 2-3 of Algorithm 5. 


Algorithm 5. Update procedure for the learning/ BRTDP version of BVI on 

SG 

1: procedure UPDATE(L : $ — [0,1], U : S — [0,1]) 

2: p <— path so,s;,...,s; of length £ < k, obtained by simulation where the 
successor of s is s' with probability 5(s,a,s’) and a is sampled randomly from 
arg maxa U(s, a) and arg min, L(s,a) for s € Sy and s € So, respectively 

3: L,U get updated by Eq. (3) and (4) on states s;,5; ,,...,s; \*allsep* \ 

4: for T € FIND. MSEC(L) do 

5: DEFLATE(T, U) 


If 1 or o is reached in a simulation, we can terminate it. It can happen that the 
simulation cycles in an EC. To that end, we have a bound k on the maximum 
number of steps. The choice of k is discussed in detail in [BCC+14] and we 
use 2-|S| to guarantee the possibility of reaching sinks as well as exploring new 
states. If the simulation cycles in an EC, the subsequent call of DEFLATE ensures 
that next time there is a positive probability to exit this EC. Further details can 
be found in [KKKW18, Appendix A.4]. 


5 Experimental Results 


We implemented both our algorithms as an extension of PRISM- 
games [CFK+13a], a branch of PRISM [KNP11] that allows for modelling 
SGs, utilizing previous work of [BCC+14, Ujm15] for MDP and SG with single- 
player ECs. We tested the implementation on the SGs from the PRISM-games 
case studies [gam] that have reachability properties and one additional model 
from [CKJ12] that was also used in [Ujm15]. We compared the results with both 
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the explicit and the hybrid engine of PRISM-games, but since the models are 
small both of them performed similar and we only display the results of the 
hybrid engine in Table 1. 

Furthermore we ran experiments on MDPs from the PRISM benchmark 
suite [KNP12]. We compared our results there to the hybrid and explicit engine 
of PRISM, the interval iteration implemented in PRISM [HM17], the hybrid 
engine of STORM [DJKV17a] and the BRTDP implementation of [BCC+14]. 

Recall that the aim of the paper is not to provide a faster VI algorithm, but 
rather the first guaranteed one. Consequently, the aim of the experiments is not 
to show any speed ups, but to experimentally estimate the overhead needed for 
computing the guarantees. 

For information on the technical details of the experiments, all the models and 
the tables for the experiments on MDPs we refer to [KKKW18, Appendix BJ. 
Note that although some of the SG models are parametrized they could only 
be scaled by manually changing the model file, which complicates extensive 
benchmarking. 

Although our approaches compute the additional upper bound to give the 
convergence guarantees, for each of the experiments one of our algorithms per- 
formed similar to PRISM-games. Tablel shows this result for three of the 
four SG models in the benchmarking set. On the fourth model, PRISM's pre- 
computations already solve the problem and hence it cannot be used to com- 
pare the approaches. For completeness, the results are displayed in [KKKW18, 
Appendix B.5]. 


Table 1. Experimental results for the experiments on SGs. The left two columns denote 
the model and the given parameters, if present. Columns 3 to 5 display the verification 
time in seconds for each of the solvers, namely PRISM-games (referred as PRISM), 
our BVI algorithm (BVI) and our learning-based algorithm (BRTDP). The next two 
columns compare the number of states that BRTDP explored (States. B) to the total 
number of states in the model. The rightmost column shows the number of MSECs in 
the model. 


Model | Parameters | PRISM | BVI BRTDP | #States_B | #States | ZZMSECs 
mdsm | prop=1 8 8 |17 767 62,245 1 
prop — 2 4 4 29 407 62,245 1 
cdmsn 2 2 3 1,212 1,240 1 
cloud | N=5 3 7 |15 1,302 8,842 4,421 
N=6 6 59 4 570 34,954 | 17,477 


Whenever there are few MSECs, as in mdsm and cdmsn, BVI performs like 
PRISM-games, because only little time is used for deflating. Apparently the 
additional upper bound computation takes very little time in comparison to the 
other tasks (e.g. parsing, generating the model, pre-computation) and does not 
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slow down the verification significantly. For cloud, BVI is slower than PRISM- 
games, because there are thousands of MSECs and deflating them takes over 
80% of the time. This comes from the fact that we need to compute the expen- 
sive end component decomposition for each deflating step. BRTDP performs 
well for cloud, because in this model, as well as generally often if there are 
many MECs [BCC+14], only a small part of the state space is relevant for 
convergence. For the other models, BRTDP is slower than the deterministic 
approaches, because the models are so small that it is faster to first construct 
them completely than to explore them by simulation. 

Our more extensive experiments on MDPs compare the guaranteed 
approaches based on collapsing (i.e. learning-based from [BCC+14] and deter- 
ministic from [HM17]) to our guaranteed approaches based on deflating (so 
BRTDP and BVI). Since both learning-based approaches as well as both deter- 
ministic approaches perform similarly (see Table 2 in [KKKW18, Appendix B]), 
we conclude that collapsing and deflating are both useful for practical purposes, 
while the latter is also applicable to SGs. Furthermore we compared the usual 
unguaranteed value iteration of PRISM’s explicit engine to BVI and saw that 
our guaranteed approach did not take significantly more time in most cases. This 
strengthens the point that the overhead for the computation of the guarantees 
is negligible. 


6 Conclusions 


We have provided the first stopping criterion for value iteration on simple 
stochastic games and an anytime algorithm with bounds on the current error 
(guarantees on the precision of the result). The main technical challenge was 
that states in end components in SG can have different values, in contrast to 
the case of MDP. We have shown that collapsing is in general not possible, but 
we utilized the analysis to obtain the procedure of deflating, a solution on the 
original graph. Besides, whenever a SEC is identified for sure it can be collapsed 
and the two techniques of collapsing and deflating can thus be combined. 

The experiments indicate that the price to pay for the overhead to compute 
the error bound is often negligible. For each of the available models, at least one 
of our two implementations has performed similar to or better than the standard 
approach that yields no guarantees. Further, the obtained guarantees open the 
door to (e.g. learning-based) heuristics which treat only a part of the state space 
and can thus potentially lead to huge improvements. Surprisingly, already our 
straightforward adaptation of such an algorithm for MDP to SG yields inter- 
esting results, palliating the overhead of our non-learning method, despite the 
most naive implementation of deflating. Future work could reveal whether other 
heuristics or more efficient implementation can lead to huge savings as in the 
case of MDP [BCC+14]. 
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Abstract. Computing reachability probabilities is at the heart of prob- 
abilistic model checking. All model checkers compute these probabilities 
in an iterative fashion using value iteration. This technique approximates 
a fixed point from below by determining reachability probabilities for an 
increasing number of steps. To avoid results that are significantly off, 
variants have recently been proposed that converge from both below 
and above. These procedures require starting values for both sides. We 
present an alternative that does not require the a priori computation 
of starting vectors and that converges faster on many benchmarks. The 
crux of our technique is to give tight and safe bounds—whose computa- 
tion is cheap—on the reachability probabilities. Lifting this technique to 
expected rewards is trivial for both Markov chains and MDPs. Exper- 
imental results on a large set of benchmarks show its scalability and 
efficiency. 


1 Introduction 


Markov decision processes (MDPs) [1,2] have their roots in operations research 
and stochastic control theory. They are frequently used for stochastic and 
dynamic optimization problems and are widely applicable in, e.g., stochastic 
scheduling and robotics. MDPs are also a natural model in randomized dis- 
tributed computing where coin flips by the individual processes are mixed with 
non-determinism arising from interleaving the processes’ behaviors. The central 
problem for MDPs is to find a policy that determines what action to take in 
the light of what is known about the system at the time of choice. The typical 
aim is to optimize a given objective, such as minimizing the expected cost until 
a given number of repairs, maximizing the probability of being operational for 
1,000 steps, or minimizing the probability to reach a “bad” state. 

Probabilistic model checking [3,4] provides a scalable alternative to tackle 
these MDP problems, see the recent surveys [5,6]. The central computational 
issue in MDP model checking is to solve a system of linear inequalities. In absence 
of non-determinism—the MDP being a Markov Chain (MC)—a linear equation 
system is obtained. After appropriate pre-computations, such as determining 
the states for which no policy exists that eventually reaches the goal state, the 
(in)equation system has a unique solution that coincides with the extremal value 
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that is sought for. Possible solution techniques to compute such solutions include 
policy iteration, linear programming, and value iteration. Modern probabilistic 
model checkers such as PRISM [7] and Storm [8] use value iteration by default. 
This approximates a fixed point from below by determining the probabilities to 
reach a target state within k steps in the k-th iteration. The iteration is typically 
stopped if the difference between the value vectors of two successive (or vectors 
that are further apart) is below the desired accuracy e. 

This procedure however can provide results that are significantly off, as the 
iteration is stopped prematurely, e.g., since the probability mass in the MDP only 
changes slightly in a series of computational steps due to a “slow” movement. 
This problem is not new; similar problems, e.g., occur in iterative approaches to 
compute long-run averages [9] and transient measures [10] and pop up in statisti- 
cal model checking to decide when to stop simulating for unbounded reachability 
properties [11]. As recently was shown, this phenomenon does not only occur for 
hypothetical cases but affects practical benchmarks of MDP model checking too 
[12]. To remedy this, Haddad and Monmege [13] proposed to iteratively approxi- 
mate the (unique) fixed point from both below and above; a natural termination 
criterion is to halt the computation once the two approximations differ less than 
2-e. This scheme requires two starting vectors, one for each approximation. For 
reachability probabilities, the conservative values zero and one can be used. For 
expected rewards, it is non-trivial to find an appropriate upper bound—how to 
"guess" an adequate upper bound to the expected reward to reach a goal state? 
Baier et al. [12] recently provided an algorithm to solve this issue. 

This paper takes an alternative perspective to obtaining a sound variant of 
value iteration. Our approach does not require the a priori computation of start- 
ing vectors and converges faster on many benchmarks. The crux of our tech- 
nique is to give tight and safe bounds—whose computation is cheap and that 
are obtained during the course of value iteration—on the reachability probabil- 
ities. The approach is simple and can be lifted straightforwardly to expected 
rewards. The central idea is to split the desired probability for reaching a target 
state into the sum of 


i) the probability for reaching a target state within k steps and 
8 8 
ii) the probability for reaching a target state only after k steps. 
8 


We obtain (i) via k iterations of (standard) value iteration. A second instance 
of value iteration computes the probability that a target state is still reachable 
after k steps. We show that from this information safe lower and upper bounds for 
(ii) can be derived. We illustrate that the same idea can be applied to expected 
rewards, topological value iteration [14], and Gauss-Seidel value iteration. We 
also discuss in detail its extension to MDPs and provide extensive experimental 
evaluation using our implementation in the model checker Storm [8]. Our experi- 
ments show that on many practical benchmarks we need significantly fewer iter- 
ations, yielding a speed-up of about 20% on average. More importantly though, 
is the conceptual simplicity of our approach. 
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e 0.1 © 0.3 


(a) A sample MC D. (b) A sample MDP M. 


Fig. 1. Example models. 


2 Preliminaries 


For a finite set S and vector z € RISI, let x[s] € R denote the entry of x that 
corresponds to s € S. Let 5S" C S and a € R. We write z[$'] = a to denote that 
zr[s] = a for all s € S’. Given z,y € RISI, x < y holds iff z[s| < y[s] holds for 
all s € S. For a function f: RISI — RISI and k > 0 we write f* for the function 
obtained by applying f k times, i.e., f°(x) = x and f*(x) = f(f*-1(x)) if k > 0. 


2.1 Probabilistic Models and Measures 


We briefly present probabilistic models and their properties. More details can 
be found in, e.g., [15]. 


Definition 1 (Probabilistic Models). A Markov Decision Process (MDP) is 
a tuple M = (S, Act, P, sr, p), where 


— S is a finite set of states, Act is a finite set of actions, s; is the initial state, 

-P:Sx Act x S — [0,1] is a transition probability function satisfying 
M yes P(s,0, 8") € {0,1} for all s € S,o € Act, and 

- p: S x Act — R is a reward function. 


M is a Markov Chain (MC) if | Act| = 1. 
Example 1. Figurel shows an example MC and an example MDP. 


We often simplify notations for MCs by omitting the (unique) action. For an 
MDP M = (S, Act, P, sj, p), the set of enabled actions of state s € S is given 
by Act(s) = {a € Act | Dyes P(s, 0,5) = 1). We assume that Act(s) 4 0 for 
each s € S. Intuitively, upon performing action o at state s reward p(s,a) is 
collected and with probability P (s, a, s’) we move to s’ € S. Notice that rewards 
can be positive or negative. 

A state s € S is called absorbing if P(s, o, s) = 1 for every a € Act(s). A path 
of M is an infinite alternating sequence 7 = s9a9s1Q1... where s; € S, a; € 
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Act(s;), and P(s;, œi, 5:41) > 0 for alli > 0. The set of paths of M is denoted by 
Paths™ . The set of paths that start at s € S is given by Paths" ?. A finite path 
ft = S0Q0 -.-Qn—1Sn is a finite prefix of a path ending with last(4) = s, € S. 
|| = n is the length of 7, Paths‘, is the set of finite paths of M, and Paths, 
is the set of finite paths that start at state s € S. We consider LTL-like notations 
for sets of paths. For k € NU {oo} and G, H C S let 


HUS! G = (soaosi::: € Paths" ^*! | s9,...,5;:.1 € H, sj € G for some j < k} 


denote the set of paths that, starting from the initial state sr, only visit states in 
H until after at most k steps a state in G is reached. Sets H 4^" G and HU™ G 
are defined similarly. We use the shorthands QS^G :— SUSE G, OG := OF%G, 
and LIS*G := Paths" ^?! \ OS*(S \ G). 

A (deterministic) scheduler for M is a function o: Paths% — Act such 
that o(7) € Act(last(7)) for all 7 € Paths‘. The set of (deterministic) sched- 
ulers for M is GM. c € GM is called positional if c(4) only depends on 
the last state of 7, i.e., for all 7,7’ € Paths‘, we have last(7#) = last(i’) 
implies c(*) = o(7’). For MDP M and scheduler c. € G™ the probabil- 
ity measure over finite paths is given by pure Paths! — [0,1] with 
Pra (so pP.) IL P(s;, c(so... Si), 8:41). The probability measure Pr? 
over measurable sets of infinite paths is obtained via a standard cylinder set con- 
struction [15]. 


Definition 2 (Reachability Probability). The reachability probability of 
MDP M = (S, Act, P, sj, p), GC S, and o € GM is given by Pr? (OG). 


For k € NU{oo}, the function eG: 0G R yields the k-bounded reachability 
reward of a path T = s90951::- € OG. We set €S*G(x) = y me p(s;, o), where 
j =min({i > 0| s; € G}U{k}). We write @G instead of ¢S®G. 


Definition 3 (Expected Reward). The expected (reachability) reward of 
MDP M = (S, Act, P, sr, p), G C S, and e € GM with Pr" ^7(Q0G) = 1 is 
given by the expectation EM” (&G) = f coc $6(7) dPr^^? (x). 


T 


We write pee and EM for the probability measure and expectation obtained 
by changing the initial state of M to s € S. If M is a Markov chain, 
there is only a single scheduler. In this case we may omit the superscript o 
from Pr^^? and EM., We also omit the superscript M if it is clear from 
the context. The maximal reachability probability of M and G is given by 
Pr™*(OG) = maxsesm Pr? (G). There is a a positional scheduler that attains 
this maximum [16]. The same holds for minimal reachability probabilities and 
maximal or minimal expected rewards. 


Example 2. Consider the MDP M from Fig.1(b). We are interested in the 
maximal probability to reach state s4 given by Pr™**({s4}). Since s4 is not 
reachable from s3 we have Prji**({sa}) = 0. Intuitively, choosing action 8 
at state sg makes reaching s3 more likely, which should be avoided in order 


Sound Value Iteration 647 


to maximize the probability to reach s4. We therefore assume a scheduler c 
that always chooses action « at state so. Starting from the initial state so, 
we then eventually take the transition from sg to s3 or the transition from s» 
to s4 with probability one. The resulting probability to reach s4 is given by 
Pr™**(O{s4}) = Pr^(Ofs4]) = 0.3/(0.1 + 0.3) = 0.75. 


2.2 Probabilistic Model Checking via Interval Iteration 


In the following we present approaches to compute reachability probabilities 
and expected rewards. We consider approximative computations. Exact compu- 
tations are handled in e.g. [17,18] For the sake of clarity, we focus on reachability 
probabilities and sketch how the techniques can be lifted to expected rewards. 


Reachability Probabilities. We fix an MDP M = (S, Act, P, sj, p), a set of 
goal states G C S, and a precision parameter € > 0. 


Problem 1. Compute an ¢-approximation of the maximal reachability probabil- 
ity Pr" ^*(6G), i.e., compute a value r € [0,1] with |r — Pr"?*(6G)| < e. 


We briefly sketch how to compute such a value r via interval iteration [12,13,19]. 
'The computation for minimal reachability probabilities is analogous. 

W.l.o.g. it is assumed that the states in G are absorbing. Using graph algo- 
rithms, we compute So = (s € S | Pr2^* (0G) = 0} and partition the state space 
of M into S = Sg U G U S» with S? = SN (GU So). If s; € So or s; € G, the 
probability Pr??*(6G) is 0 or 1, respectively. From now on we assume s; € 55. 

We say that M is contracting with respect to S’ C S if Pr?(S’) = 1 for all 
s € S and for all c € GM. We assume that M is contracting with respect to 
GU So. Otherwise, we apply a transformation on the so-called end components! 
of M, yielding a contracting MDP M’ with the same maximal reachability 
probability as M. Roughly, this transformation replaces each end component 
of M with a single state whose enabled actions coincide with the actions that 
previously lead outside of the end component. This step is detailed in [13,19]. 

We have z*[s] = Pr??*(OG) for s € S and the unique fixpoint z* of the 
function f: RISI — RIS! with f(z)[So] = 0, f(z)[G] = 1, and 


Kols = max Y^ P(s,o,2) -a[s] 
a€ Act(s) ves 


for s € Ss. Hence, computing Pr"?*(O6G) reduces to finding the fixpoint of f. 
A popular technique for this purpose is the value iteration algorithm [1]. 
Given a starting vector x € RISI with z[So] = 0 and 2[G] = 1, standard value 
iteration computes f" (z) for increasing k until maxses | f^(x)[s] f^^! (z)[s]| < € 
holds for a predefined precision & > 0. As pointed out in, e.g., [13], there is no 


1 Intuitively, an end component is a set of states S’ C S such that there is a scheduler 
inducing that from any s € S' exactly the states in S' are visited infinitely often. 
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guarantee on the preciseness of the result r = f*(a)[s7], i.e., standard value 
iteration does not give any evidence on the error |r — Pr™**(G)|. The intuitive 
reason is that value iteration only approximates the fixpoint z" from one side, 
yielding no indication on the distance between the current result and z*. 


Example 3. Consider the MDP M from Fig. 1(b). We invoked standard value 
iteration in PRISM [7] and Storm [8] to compute the reachability probability 
Pr™**(O{s4}). Recall from Example2 that the correct solution is 0.75. With 
(absolute) precision e = 1078 both model checkers returned 0.7248. Notice that 
the user can improve the precision by considering, e.g., € = 1078 which yields 
0.7497. However, there is no guarantee on the preciseness of a given result. 


The interval iteration algorithm [12,13,19] addresses the impreciseness of 
value iteration. The idea is to approach the fixpoint x* from below and from 
above. The first step is to find starting vectors z;, x, € RISI satisfying xe[ So] = 
LulSo] = 0, z;[G] = x,[G] = 1, and ae € x* € £u. As the entries of z* are 
probabilities, it is always valid to set x[S7] = 0 and 2,,[S7] = 1. We have 
f¥ (xe) € a* € f'(x,) for any k > 0. Interval iteration computes f*(ae) and 
fE (xu) for increasing k until maxses | f*(a)[s] — f*(xu)[s]| < 2e. For the result 
r = 1/2 - (fE(xo[sr] + f*(xu)[sr]) we obtain that |r — Pr"^*(6G)| < e, i.e., we 
get a sound approximation of the maximal reachability probability. 


Example 4. We invoked interval iteration in PRISM and Storm to compute the 
reachability probability Pr"^*(O(s4)) for the MDP M from Fig. 1(b). Both 
implementations correctly yield an &e-approximation of Pr™*“(O{s4}), where we 
considered ¢ = 1078. However, both PRISM and Storm required roughly 300,000 
iterations for convergence. 


Expected Rewards. Whereas [13,19] only consider reachability probabilities, 
[12] extends interval iteration to compute expected rewards. Let M be an MDP 
and G be a set of absorbing states such that M is contracting with respect to G. 


Problem 2. Compute an e-approximation of the maximal expected reachability 
reward E™**(@G), i.e., compute a value r € R with |r — E™*(@G)| < e. 


We have z*[s] = E™**(@G) for the unique fixpoint z* of g: RISI — RIS! with 


g(x)[G] =0 and Sp uar p(s, a) yt ME (s,a, s) - x[s'] 
s'es 


for s ¢ G. As for reachability probabilities, interval iteration can be applied to 
approximate this fixpoint. The crux lies in finding appropriate starting vectors 
£e, &u € RISI guaranteeing z; < z* < £u. To this end, [12] describe graph based 
algorithms that give an upper bound on the expected number of times each 
individual state s € S \ G is visited. This then yields an approximation of the 
expected amount of reward collected at the various states. 
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3 Sound Value Iteration for MCs 


We present an algorithm for computing reachability probabilities and expected 
rewards as in Problems! and 2. The algorithm is an alternative to the inter- 
val iteration approach [12,20] but (i) does not require an a priori computation 
of starting vectors z;,z, € RISI and (ii) converges faster on many practical 
benchmarks as shown in Sect.5. For the sake of simplicity, we first restrict to 
computing reachability probabilities on MCs. 

In the following, let D = (S, P, sz, p) be an MC, G C S be a set of absorbing 
goal states and £ > 0 be a precision parameter. We consider the partition S = 
So UGU S» as in Sect. 2.2. The following theorem captures the key insight of our 
algorithm. 


Theorem 1. For MC D let G and S; be as above and k > 0 with Prs (OSF S2) < 
1 for all s € S». We have 


< gran O 
Hey CIE PRE e PEHERSD 
s Pr (SEG) 
< Pr(0G) <Pr(O“"G) + Pr(D^ 57) - max 1 — Pr, (O54897) 


Theorem 1 allows us to approximate Pr(QG) by computing for increasing k € N 


— Pr(**G), the probability to reach a state in G within k steps, and 
— Pr(OS* S$»), the probability to stay in S? during the first k steps. 


This can be realized via a value-iteration based procedure. The obtained bounds 
on Pr(QG) can be tightened arbitrarily since Pr(L15* S5) approaches 0 for increas- 
ing k. In the following, we address the correctness of Theorem 1, describe the 
details of our algorithm, and indicate how the results can be lifted to expected 
rewards. 


3.1 Approximating Reachability Probabilities 


To approximate the reachability probability Pr(QG), we consider the step 
bounded reachability probability Pr(QS^G) for k > 0 and provide a lower and 
an upper bound for the ‘missing’ probability Pr(QG) — Pr(OS^G). Note that OG 
is the disjoint union of the paths that reach G within k steps (given by 0S^G) 
and the paths that reach G only after k steps (given by S? ^*^ G). 


Lemma 1. For any k > 0 we have Pr(QG) = Pr(OS*G) + Pr(S?u7^* G). 


A path m € S2U7*G reaches some state s € S» after exactly k steps. This 
yields the partition S; 7^ G = Ucs, (Ss 4^" (s) n OG). It follows that 


Pr(S;U^* G) = V Pr(Ss U^ {s}) - Pr,(0G). 


sc S» 
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Consider £, u € [0,1] with £ € Pr,(OG) € u for all s € S5, i.e., @ and u are 
lower and upper bounds for the reachability probabilities within S;. We have 


Y Pr(Su7* (sy) Pr,(0G) € M Pr(S2U=*{s}) -u = Pr(OS*S2) -u 


sES? sES? 


We can argue similar for the lower bound £. With Lemma1 we get the fol- 
lowing. 


Proposition 1. For MC D with G, S5, £, u as above and any k > 0 we have 


Pr(O**G) + Pr(OS¥ S7) . ? < Pr(OG) < Pr(05*G) + Pr(OS*S2) -u 


Remark 1. The bounds for Pr(QG) given by Proposition 1 are similar to the 
bounds obtained after performing k iterations of interval iteration with starting 
vectors £e, £u € RISI, where z|S?] = £ and [$2] = u. 


We now discuss how the bounds £ and u can be obtained from the step bounded 
probabilities Pr,(OS*G) and Pr,(OS*S2) for s € Sz. We focus on the upper 
bound u. The reasoning for the lower bound / is similar. 

Let Smax € S? bea state with maximal reachability probability, that is Smax € 
arg maX,cs, Pr. (0G). From Proposition 1 we get 


Pr, (0G) € Pr, (05^G) + Pra SF S3) : Pr, (OG). 


We solve the inequality for Pr... (OG) (assuming Pr,(LIS ^55) < 1 for all 
s € 93): 


Pr 0s) mue Pr. (SEG) 


Prs < 7 _ P. Mko 
Tsnax (OG) € 1— Prina (OSES?) ^ ses; 1 — Pr,(OS*S>) 


Proposition 2. For MC D let G and Ss be as above and k > 0 such that 
Pr,(OS* $2) <1 for all s € Sz. For every 8 € S; we have 


Pr,(0S*G) 
red; I-Pn(Ds s). 


Pr,(0S*G) 
PralOG) S 


Theorem 1 is a direct consequence of Propositions 1 and 2. 


3.2 Extending the Value Iteration Approach 


Recall the standard value iteration algorithm for approximating Pr(OG) as 
discussed in Sect.2.2. The function f: RISI! — RIS! for MCs simplifies to 
f(z)|So] = 0, f(z)[G] = 1, and f(x)[s] = »,, cs P(s,s') - x[s'] for s € S». We 
can compute the k-step bounded reachability probability at every state s € S 
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Input : MC D = (S,P,sr, p), absorbing states G C S, precision £ > 0 
Output : r € R with |r — Pr(0G)| < € 


vk — f(zk-i); Yk — h(yk-1) 
if y.[s] < 1 for all s € $; then 
] 


£y — max(/x 1, minses, EL); ux — min(us 1, maxses, EL) 
? 1—yx[s] 1—yxls] 


1 S: — S \ ({s € S | Prs(OG) = 0} UG) 

2 initialize zo, yo € R!S! with xo[G] = 1, xo[S V G] = 0, yo[Sz] = 1, yo[S V S2] = 0 
3 lo — —0o0; uo — +00 

4k—0 

5 repeat 

6 k-—k+1 

7 

8 

9 


10 until yx [sr] : (ux — £x) «2-& 


11 return zy, [si] + yx [si] - oath 


Algorithm 1: Sound value iteration for MCs. 


by performing k iterations of value iteration [15, Remark 10.104]. More pre- 
cisely, when applying f k times on starting vector x € RISI with z[G| = 1 and 
z[S \ G] = 0 we get Pr,(05^G) = f*(zx)[s]. The probabilities Pr,(L1$ ^55) for 
s € S can be computed similarly. Let h: RIS! — RIS! with h(y)[S V S2] = 0 and 
A(y)[s] = Eyes P(s.s')- [s] for s € Sv. For starting vector y € RI?! with 
y[S2] = 1 and y[S V Ss] = 0 we get Pr,(L1IS*55) = n*(y)[s]. 

Algorithm 1 depicts our approach. It maintains vectors xk, yg € Ri?! 
which, after k iterations of the loop, store the k-step bounded probabilities 
Pr,(05^G) and Pr.(L15^ S2), respectively. Additionally, the algorithm considers 
lower bounds £% and upper bounds ux such that the following invariant holds. 


Lemma 2. After executing the loop of Algorithm 1 k times we have for all s € S5 
that zy[s] = Prs(OS*G), yx[s] = Prs(OS*S2), and £y < Pre(OG) < ux. 


The correctness of the algorithm follows from Theorem 1. Termination is guaran- 
teed since Pr(O(So U G)) = 1 and therefore lim;.,44 Pr(OS* $2) = Pr(O$?) = 0. 


Theorem 2. Algorithm 1 terminates for any MC D, goal states G, and precision 
€ 7» 0. The returned value r satisfies |r — Pr(OG)| < e. 


Example 5. We apply Algorithm 1 for the MC in Fig.1(a) and the set of goal 
states G = {s4}. We have S? = (59,51, 52}. After k = 3 iterations it holds that 


z3|[so] = 0.00003 x3[S1| = 0.003 3[s3] = 0.3 
ya|[so] — 0.99996 yal[si] — 0.996 ya|[sa] — 0.6 


Hence, — = 3 = 0.75 for all s € S5». We get £4 = ua = 0.75. The algorithm 
converges for any € > 0 and returns the correct solution za[so] + ya [so] - 0.75 = 
0.75. 
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3.3 Sound Value Iteration for Expected Rewards 


We lift our approach to expected rewards in a straightforward manner. Let G C S 
be a set of absorbing goal states of MC D such that Pr(OG) = 1. Further let Sẹ = 
S\G. For k > 0 we observe that the expected reward E(@G) can be split into the 
expected reward collected within k steps and the expected reward collected only 
after k steps, i.e., E($G) = E($5*G) -- 5, s, Pr(Ss 47^ (s))-E,($G). Following 
a similar reasoning as in Sect. 3.1 we can show the following. 

Theorem 3. For MC D let G and Ss be as before and k > 0 such that 
Pr,(OS*S>) <1 for all s € $?. We have 


TAS: <k o, ; (SEG) 
(4 ^^G) + Pr( S7) Eu 1 — Pr, (US* 5?) 
. (a <k <k 2, (95 ^G) 
< E(€G) SE( 7G) + PH(H 75) max p ERS) 


Recall the function g: RISI — RISI from Sect. 2.2, given by g(x)|G] = 0 and 
g(x)|s] = p(s) + 32, eg P(s, 8’) : v[5] for s € S». For s € S and x € RISI with 
z[S|] = 0 we have E,(€5*G) = g*(zx)[s]. We modify Algorithm 1 such that it 
considers function g instead of function f. Then, the returned value r satisfies 
lr —E(@G)| < €. 


3.4 Optimizations 


Algorithm 1 can make use of initial bounds lo, uo € R with lo < Pr,(QG) € uo 
for all s € S». Such bounds could be derived, e.g., from domain knowledge or 
during preprocessing [12]. The algorithm always chooses the largest available 
lower bound for £j and the lowest available upper bound for uz, respectively. If 
Algorithm 1 and interval iteration are initialized with the same bounds, Algo- 
rithm 1 always requires as most as many iterations compared to interval iteration 
(cf. Remark 1). 

Gauss-Seidel value iteration [1,12] is an optimization for standard value iter- 
ation and interval iteration that potentially leads to faster convergence. When 
computing f(x)[s] for s € S», the idea is to consider already computed results 
f (z)[s'] from the current iteration. Formally, let < C S x S be some strict total 
ordering of the states. Gauss-Seidel value iteration considers instead of function 
f the function f4: RISI 5 RIS! with f.[So] = 0, f.[G] = 1, and 


Fls] = M P(s.5) - f(s] + Y 7 Pls, 8’)  v[s']. 
sxs si&s 
Values f.(x)[s|] for s € S are computed in the order defined by <. This idea can 
also be applied to our approach. To this end, we replace f by f< and h by hz, 
where h- is defined similarly. More details are given in [21]. 
Topological value iteration [14] employs the graphical structure of the MC D. 
The idea is to decompose the states S of D into strongly connected components? 


? S’ C S is a connected component if s can be reached from s’ for all s,s’ € S". S’ is 
a strongly connected component if no superset of S' is a connected component. 
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(SCCs) that are analyzed individually. The procedure can improve the runtime 
of classical value iteration since for a single iteration only the values for the 
current SCC have to be updated. A topological variant of interval iteration is 
introduced in [12]. Given these results, sound value iteration can be extended 
similarly. 


4 Sound Value Iteration for MDPs 


We extend sound value iteration to compute reachability probabilities in MDPs. 
Assume an MDP M = (S, Act, P, sj, p) and a set of absorbing goal states G. 
For simplicity, we focus on maximal reachability probabilities, i.e., we compute 
Pr"?*(6G). Minimal reachability probabilities and expected rewards are analo- 
gous. As in Sect. 2.2 we consider the partition S = So U G U S? such that M is 
contracting with respect to GU So. 


4.1 Approximating Maximal Reachability Probabilities 


We argue that our results for MCs also hold for MDPs under a given scheduler 
c € GM, Let k > 0 such that Pr?(OS*S2) < 1 for all s € S;. Following the 
reasoning as in Sect. 3.1 we get 


. . Prz(0S*G) 
c(A&Ek c <k . s < c « max : 
Pr?^(05^G) + Pr^( S?) an I Pr GEFs) Pres; = Pr^(6G) < Pr"?*(6G) 


Next, assume an upper bound u € R with Pr27*(O0G) < u for all s € Ss. For 
a scheduler Omax € GM that attains the maximal reachability probability, i.e., 
Omax € argmax, c e Pr" (0G) it holds that 


Pr™(OG) = Pr (6) < Pr™=()=*G) + Prom (OSF S) - u 
< max (Pr^(05*G) + Pr? (DS ^85). u). 


cce6^M 


We obtain the following theorem which is the basis of our algorithm. 


Theorem 4. For MDP M let G, S», and u be as above. Assume o € GM 
and k > 0 such that o € argmaxy eg Pr?^ (OS*G) + Pr? (GS*S?) -u and 
Pr?(LIS^S5) < 1 for all s € S>. We have 


" o0 PE 
T(A<k c <k 5s s 
d c ee ses; 1 — Pr? (OSES) 


< Pr™*(0G) < Pr” (SEG) + Pr? (OSES) - u. 


Similar to the results for MCs it also holds that Pr^^*(0G) € max;ee üg 
with 


. Pr? (SEG) 
Oc. D.C(AXk o <k : EE n) NN 
ûz := Pr^(0^"^G) + Pr^( S?) max 1— Pr? (OS* S.) 
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c OBa TBB c „0 
Pree Prag Prs; Prsi Prz 


OG) 0 03 03 01 01 
$!9., 08 04 04 09 0 
09?G 01 0.3 042 01 041 
S?9.|0.72 0.32 016 0 0 


(a) Sample MDP M. (b) Step bounded probabilities for M. 


Fig. 2. Example MDP with corresponding step bounded probabilities. 


However, this upper bound can not trivially be embedded in a value iteration 
based procedure. Intuitively, in order to compute the upper bound for iteration 
k, one can not necessarily build on the results for iteration k — 1. 


Example 6. Consider the MDP M given in Fig.2(a). Let G = {s3, s4} be the 
set of goal states. We therefore have S? = (59,51, s2}. In Fig. 2(b) we list step 
bounded probabilities with respect to the possible schedulers, where Oa, Osa; 
and ogg refer to schedulers with o4(so) = « and for y € {a, 8}, eg4(so) = B 
and c54(sofsg) = y. Notice that the probability measures Prg, and Pr, are 
independent of the considered scheduler c. For step bounds k € {1,2} we get 


- maxgcem tf = üj^ = 0 + 0.8 - max(0, 1,0) = 0.8 and 
- max,cgA OF = a2"? = 0.42 + 0.16 - max(0.5,0.19, 1) = 0.5. 


4.2 Extending the Value Iteration Approach 


The idea of our algorithm is to compute the bounds for Pr"^*(6G) as in The- 
orem 4 for increasing k > 0. Algorithm2 outlines the procedure. Similar to 
Algorithm 1 for MCs, vectors £k, yx € RÍ?! store the step bounded probabili- 
ties Pr?^ (OS*G) and Pr?*(OS*S>) for any s € S. In addition, schedulers cg and 
upper bounds uj > max;es, Pr? (0G) are computed in a way that Theorem 4 
is applicable. 


Lemma 3. After executing k iterations of Algorithm 2 we have for all s € S? 
that xp|s] = Pro (SFG), yx[s] = Pro*(OS*S2), and £y < Pr?^*(0G) < ux, 
where op € argmax,cgmPr?(OS*G) + Pro (OSS) - ug. 


The lemma holds for k = 0 as zo, yo, and uo are initialized accordingly. For 
k > 0 we assume that the claim holds after k — 1 iterations, i.e., for xy 1, yx—1, 
uj; and scheduler o;...;. The results of the kth iteration are obtained as follows. 

The function findAction illustrated in Algorithm 3 determines the choices of 
a scheduler op € arg max; cg Pr? (OS*G) + Pr? (DIS*85) - ui for s € Ss. The 
idea is to consider at state s an action ox(s) = a € Act(s) that maximizes 


Pr?* (SEG) + Pr?* (DIS*85) - up_1 = XC P(s, a, s") (zy i [s'] + yr—i[s’] ux). 
s'es 
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Input : MDP M = (S, Act, P, sr, p), absorbing states G C S, precision £ > 0 
Output : r € R with |r — Pr"^*(6G)| < € 


1 So — {s E S| Pr?**(OG) = 0} 

2 assert that M is contracting with respect to G U So 

3 S? — SN (So UG 

4 initialize zo, yo € R^! with zo[G] = 1, xo[S V G] = 0, yo[S27] = 1, yo[S \ S2] = 0 
5 fo « OO; Uo co; do oo 

6 k0 

7 repeat 

8 k—k+1 

9 initialize zx, yx € R?! with x,[(G] = 1, zk[S0] = 0, yx[S V S2] = 0 
10 dk $= dy a3 
11 foreach s € S? do 

12 a — findAction(£k—1, Uk—1, S, uk—1) 

13 dy — max(dy, decision Value(£k—1, Yr—1; 8, )) 

14 rx[8] — veg P(s, a, 8’) - vx [s"] 

15 yels] — Doves Pls, 0,8") -w-a[s] 


16 if y.[s] < 1 for all s € $; then 


17 lk — max(£y 1, minses, pas) 
18 uy — min(ug—1, max(dy, max,es, HEL )) 


19 until yi[si] - (uk — £4) <2-e 


20 return z;[sr] + yx|si] - m 


Algorithm 2: Sound value iteration for MDPs 


For the case where no real upper bound is known (i.e., uy 1 = oo) we implicitly 
assume a sufficiently large value for uj. such that Pr7(O5*G) becomes negli- 
gible. Upon leaving state s, c; mimics 804.4, le., we set Ok(SQS1Q1 ... Sn) = 
On-1($101...Sn). After executing Line 15 of Algorithm 2 we have gæs] = 
Pr?* (OS*G) and yy[s] = PrZ*(LI$*S;). 

It remains to derive an upper bound ux. To ensure that Lemma 3 holds we 
require (i) uj > max;es, Pr? ^* (6G) and (ii) up € Up, where 


E 


Uy, = {u € R | og € arg max Pr? (QS^G) + Pr? (DIS ^ S2) - u for all s € S;). 
oeM 


Intuitively, the set Up C R consists of all possible upper bounds u for which 
cy is still optimal. Up C is convex as it can be represented as a conjunction of 
inequalities with Up = R and u € Uj, if and only if u € U,. , and for all s € S> 
with o(s) = a and for all 8 € Act(s) \ {a} 


X P(s,a, s") (£k-1[8] + ye_i[s’] u) > XC P(s, B, s’)-(xp—1[s’] + yk-1[5] - u). 


sS ES s'ES 


The algorithm maintains the so-called decision value dy, which corresponds to the 
minimum of U; (or —oo if the minimum does not exist). Algorithm 4 outlines the 
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pi 


function findAction(x, y, s, u) 


2 if u Æ co then 

3 return o € arg maX,c44(5 ? »s es P(5,055)- (a[s’] + ys] : v) 

4 else 

5 return o € arg MaXy¢ Act(s) ? sreg P(5,05 5) - (y[s]) 
Algorithm 3: Computation of optimal action. 

1 function decision Value(x, y, s, a) 

2 d — —oo 

3 foreach 8 € Act(s) \ {a} do 

4 ya — Dore (P(s, a, 8") — P(s, B, $')) - y[s"] 

5 if ya > 0 then 

6 za — P es(P(s, 8, 8’) — P(s,o,))- xls] 

7 d — max(d, *4/y,) 

8 | return d 


Algorithm 4: Computation of decision value. 


procedure to obtain the decision value at a given state. Our algorithm ensures 
that uj is only set to a value in [dz, uj. .1] C Uk. 


Lemma 4. After executing Line 18 of Algorithm 2: uy > maxses, Pr? ^* (0G). 


To show that ux is a valid upper bound, let Smax € arg max,es, Pr? ^" (0G) and 
u* = Pr27* (OG). From Theorem 4, uj; > u*, and uz_1 € Uy we get 


Smax 


(OS*G) + Pr? ( Sk S3) - Uk—1 


u* < max Pr? 


> (cO M Smax Smax 
= Pra (O<*@) + Pr ol m: ` Uk—1 = Tk[Smax] + Vk|Smax] : ux-1 


which yields a new upper bound z&[smax] + yx[Smax]: ux—1 > u*. We repeat this 
scheme as follows. Let vo :— ug—1 and for i > 0 let v; :— zk[Smax]-- yk [Smax]: vi-1- 
We can show that v; .; € Up implies v; > u*. Assuming yx|smax] < 1, the 


sequence Ug, U1, U2,... converges to Ug :— lim; o5 v; = ——. We distin- 


guish three cases to show that ug = min(uy..1, max(dk, maxse s, D) > u*. 


— If vœ > Up—1, then also maxses, Cu uy 4. Hence ug = uj. 4 > u*. 

— If dk € vo; € ux..1, we can show that v; < v;_1. It follows that for all i > 0, 
vii € Uk, implying v; > u*. Thus we get uy = maxses, DE > Væ > u*. 

— If vy < dp then there is ani > 0 with v; > dp and u* € vj,1 < dx. It follows 
that Uk = dk > u*. 
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Example 7. Reconsider the MDP M from Fig. 2(a) and goal states G = {s3, s4}. 
The maximal reachability probability is attained for a scheduler that always 
chooses 8 at state so, which results in Pr"?*(6G) = 0.5. We now illustrate how 
Algorithm 2 approximates this value by sketching the first two iterations. For 
the first iteration findAction yields action @ at sg. We obtain: 


z1[so] = 0, z1[s1] = 0.1, z1[s2] = 0.1, yi [So] = 0.8, yi [si] = 0.9, Ui [s2] = 0, 
dı = 0.3/(0.8 — 0.4) = 0.75, 2: = min(0,1,0) = 0, u; = max(0.75,0, 1,0) = 1. 


In the second iteration findAction yields again a for so and we get: 


£2[89] = 0.08, xa[si1] = 0.19, z2[s2] = 0.1, y2[so] = 0.72, y2[si] = 0, y2[s2] = 0, 
dz = 0.75, £9 = min(0.29, 0.19, 0.1) = 0.1, uz = max(0.75, 0.29, 0.19, 0.1) = 0.75. 


Due to the decision value we do not set the upper bound u2 to 0.29 < Pr^^* (6G). 


Theorem 5. Algorithm 2 terminates for any MDP M, goal states G and pre- 
cision € > 0. The returned value r satisfies |r — Pr ^* (0G)| € e. 


The correctness of the algorithm follows from Theorem 4 and Lemma 3. Ter- 
mination follows since M is contracting with respect to Sg U G, implying 
limk—oo Pr (OSES?) = 0. The optimizations for Algorithm 1 mentioned in 
Sect. 3.4 can be applied to Algorithm 2 as well. 
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10! 40? 40? 
(a) Model checking times (in seconds). 
40? 40? 40? -107 
T 5 T ^1 m4 T 7 T 
2 a’ a 
SU : 8- " J10 F 
aê 10 
" of 2 
1 2 ^ n a 5 
5 Pei ibe » a 
0 | 0 Li rd 0 a , 
0 1 2 0 5 10 15 0 1 2 3 4 0 5 10 
40? 40? 40? -107 


(b) Required iterations. 


Fig. 3. Comparison of sound value iteration (x-axis) and interval iteration (y-axis). 
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5 Experimental Evaluation 


Implementation. We implemented sound value iteration for MCs and MDPs 

into the model checker Storm [8]. The implementation computes reachability 

probabilities and expected rewards using explicit data structures such as sparse 

matrices and vectors. Moreover, Multi-objective model checking is supported, 

where we straightforwardly extend the value iteration-based approach of [22] to 

sound value iteration. We also implemented the optimizations given in Sect. 3.4. 
The implementation is available at www.stormchecker.org. 


Experimental Results. We considered a wide range of case studies including 


— all MCs, MDPs, and CTMCs from the PRISM benchmark suite [23], 
several case studies from the PRISM website www.prismmodelchecker.org, 
— Markov automata accompanying IMCA [24], and 

— multi-objective MDPs considered in [22]. 


| 


In total, 130 model and property instances were considered. For CTMCs and 
Markov automata we computed (untimed) reachability probabilities or expected 
rewards on the underlying MC and the underlying MDP, respectively. In all 
experiments the precision parameter was given by e = 10-6. 

We compare sound value iteration (SVI) with interval iteration (IT) as pre- 
sented in [12,13]. We consider the Gauss-Seidel variant of the approaches and 
compute initial bounds £o and ug as in [12]. For a better comparison we consider 
the implementation of II in Storm. [21] gives a comparison with the implemen- 
tation of II in PRISM. The experiments were run on a single core (2GHz) of an 
HP BL685C G7 with 192GB of available memory. However, almost all experi- 
ments required less than 4GB. We measured model checking times and required 
iterations. All logfiles and considered benchmarks are available at [25]. 

Figure3(a) depicts the model checking times for SVI (x-axis) and II (y-axis). 
For better readability, the benchmarks are divided into four plots with different 
scales. Triangles (A) and circles (e) indicate MC and MDP benchmarks, respec- 
tively. Similarly, Fig. 3(b) shows the required iterations of the approaches. We 
observe that SVI converged faster and required fewer iterations for almost all 
MCs and MDPs. SVI performed particularly well on the challenging instances 
where many iterations are required. Similar observations were made when com- 
paring the topological variants of SVI and II. Both approaches were still com- 
petitive if no a priori bounds are given to SVI. More details are given in [21]. 

Figure4 indicates the model checking times of SVI and II as well as their 
topological variants. For reference, we also consider standard (unsound) value 
iteration (VI). The z-axis depicts the number of instances that have been solved 
by the corresponding approach within the time limit indicated on the y-axis. 
Hence, a point (x, y) means that for x instances the model checking time was less 
or equal than y. We observe that the topological variant of SVI yielded the best 
run times among all sound approaches and even competes with (unsound) VI. 
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Fig. 4. Runtime comparison between different approaches. 


6 Conclusion 


In this paper we presented a sound variant of the value iteration algorithm which 
safely approximates reachability probabilities and expected rewards in MCs and 
MDPs. Experiments on a large set of benchmarks indicate that our approach is 
a reasonable alternative to the recently proposed interval iteration algorithm. 
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Abstract. Apprenticeship learning (AL) is a kind of Learning from 
Demonstration techniques where the reward function of a Markov Deci- 
sion Process (MDP) is unknown to the learning agent and the agent has 
to derive a good policy by observing an expert’s demonstrations. In this 
paper, we study the problem of how to make AL algorithms inherently 
safe while still meeting its learning objective. We consider a setting where 
the unknown reward function is assumed to be a linear combination of 
a set of state features, and the safety property is specified in Probabilis- 
tic Computation Tree Logic (PCTL). By embedding probabilistic model 
checking inside AL, we propose a novel countererample-guided approach 
that can ensure safety while retaining performance of the learnt policy. 
We demonstrate the effectiveness of our approach on several challenging 
AL scenarios where safety is essential. 


1 Introduction 


The rapid progress of artificial intelligence (AT) comes with a growing concern 
over its safety when deployed in real-life systems and situations. As highlighted in 
[3], if the objective function of an AI agent is wrongly specified, then maximizing 
that objective function may lead to harmful results. In addition, the objective 
function or the training data may focus only on accomplishing a specific task and 
ignore other aspects, such as safety constraints, of the environment. In this paper, 
we propose a novel framework that combines explicit safety specification with 
learning from data. We consider safety specification expressed in Probabilistic 
Computation Tree Logic (PCTL) and show how probabilistic model checking 
can be used to ensure safety and retain performance of a learning algorithm 
known as apprenticeship learning (AL). 

We consider the formulation of apprenticeship learning by Abbeel and Ng [1]. 
The concept of AL is closely related to reinforcement learning (RL) where an 
agent learns what actions to take in an environment (known as a policy) by 
maximizing some notion of long-term reward. In AL, however, the agent is not 
given the reward function, but instead has to first estimate it from a set of expert 
demonstrations via a technique called inverse reinforcement learning [18]. The 
formulation assumes that the reward function is expressible as a linear combina- 
tion of known state features. An expert demonstrates the task by maximizing this 
reward function and the agent tries to derive a policy that can match the feature 
expectations of the expert's demonstrations. Apprenticeship learning can also be 
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viewed as an instance of the class of techniques known as Learning from Demon- 
stration (LfD). One issue with LfD is that the expert often can only demonstrate 
how the task works but not how the task may fail. This is because failure may 
cause irrecoverable damages to the system such as crashing a vehicle. In general, 
the lack of “negative examples” can cause a heavy bias in how the learning agent 
constructs the reward estimate. In fact, even if all the demonstrations are safe, 
the agent may still end up learning an unsafe policy. 

The key idea of this paper is to incorporate formal verification in appren- 
ticeship learning. We are inspired by the line of work on formal inductive syn- 
thesis [10] and counterexample-guided inductive synthesis [22]. Our approach 
is also similar in spirit to the recent work on safety-constrained reinforcement 
learning [11]. However, our approach uses the results of model checking in a 
novel way. We consider safety specification expressed in probabilistic computa- 
tion tree logic (PCTL). We employ a verification-in-the-loop approach by embed- 
ding PCTL model checking as a safety checking mechanism inside the learning 
phase of AL. In particular, when a learnt policy does not satisfy the PCTL for- 
mula, we leverage counterexamples generated by the model checker to steer the 
policy search in AL. In essence, counterexample generation can be viewed as 
supplementing negative examples for the learner. Thus, the learner will try to 
find a policy that not only imitates the expert’s demonstrations but also stays 
away from the failure scenarios as captured by the counterexamples. 

In summary, we make the following contributions in this paper. 


— We propose a novel framework for incorporating formal safety guarantees in 
Learning from Demonstration. 

— We develop a novel algorithm called CounterExample Guided Apprenticeship 
Learning (CEGAL) that combines probabilistic model checking with the 
optimization-based approach of apprenticeship learning. 

— We demonstrate that our approach can guarantee safety for a set of case 
studies and attain performance comparable to that of using apprenticeship 
learning alone. 


The rest of the paper is organized as follows. Section 2 reviews background 
information on apprenticeship learning and PCTL model checking. Section 3 
defines the safety-aware apprenticeship learning problem and gives an overview 
of our approach. Section 4 illustrates the counterexample-guided learning frame- 
work. Section5 describes the proposed algorithm in detail. Section 6 presents 
a set of experimental results demonstrating the effectiveness of our approach. 
Section 7 discusses related work. Section 8 concludes and offers future directions. 


2 Preliminaries 


2.1 Markov Decision Process and Discrete-Time Markov Chain 


Markov Decision Process (MDP) is a tuple M = (S, A, T,y,so, R), where S 
is a finite set of states; A is a set of actions; T : Sx Ax S — [0,1] is a 
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transition function describing the probability of transitioning from one state 
s € S to another state by taking action a € A in state s; R: S > R is a reward 
function which maps each state s € S to a real number indicating the reward 
of being in state s; so € S is the initial state; y € [0,1) is a discount factor 
which describes how future rewards attenuate when a sequence of transitions is 
made. A deterministic and stationary (or memoryless) policy m : $ — A for an 
MDP M is a mapping from states to actions, i.e. the policy deterministically 
selects what action to take solely based on the current state. In this paper, we 
restrict ourselves to deterministic and stationary policy. A policy 7 for an MDP 
M induces a Discrete-Time Markov Chain (DTMC) M; = (5,7;,50), where 


T. : S x S — [0,1] is the probability of transitioning from a state s to another 


$ : Tx (s0,51)20 Tx(s1,82)>0 ; 
state in one step. A trajectory T = So $1 $2; E 


sequence of states where s; € S. The accumulated reward of 7 is 2 Y! R(s;). 

The e value function V, : S — R measures the expectation of ad reward 
EIS Y! R(s;)] starting from a state s and following policy 7. An optimal policy 
i=0 


7 for MDP M isa policy that maximizes the value function [4]. 


2.2 Apprenticeship Learning via Inverse Reinforcement Learning 


Inverse reinforcement learning (IRL) aims at recovering the reward function R 
of M\R = (S,A,T,7,50) from a set of m trajectories Te = {70,71,---;Tm—1} 
demonstrated by an expert. Apprenticeship learning (AL) [1] assumes that the 
reward function is a linear combination of state features, ie. R(s) = wf f(s) 
where f : S — [0, 1]* is a vector of known features over states S and w € R* is an 
unknown weight vector that satisfies ||.|| < 1. The expected features of a policy 
7 are the expected values of the cumulative discounted state features f(s) by fol- 
lowing 7 on M, i.e. pr = El 5, 9 Y. f (st)|n]. Let ug denote the expected features 
of the unknown expert’s policy mz. ug can be approximated by the expected 
oo 
features of expert's m demonstrated trajectories fig = i $3 Yoo f (st) ifm 
TEI t=0 
is large enough. With a slight abuse of notations, we use up to also denote the 
expected features of a set of paths I’. Given an error bound e, a policy 7* is 
defined to be e-close to 7 g if its expected features ur» satisfies ||ug — us ||lo < €. 
The expected features of a policy can be calculated by using Monte Carlo 
method, value iteration or linear programming [1,4]. 
The algorithm proposed by Abbeel and Ng [1] starts with a random policy 
To and its expected features Hr. Assuming that in iteration i, a set of i candi- 
date policies I = (70,71,...,7;-1) and their corresponding expected features 
[us|x € IT} have been found, the algorithm solves the following optimization 
problem. 
ô = max min w! (jig — Hr) s.t. ||w||a € 1 (1) 
w nell 
The optimal w is used to find the corresponding optimal policy 7; and the 
expected features ju,,. If 6 < e, then the algorithm terminates and 7; is produced 
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as the output. Otherwise, Hr; is added to the set of features for the candidate 
policy set // and the algorithm continues to the next iteration. 


2.3 PCTL Model Checking 


Probabilistic model checking can be used to verify properties of a stochastic 
system such as “is the probability that the agent reaches the unsafe area within 
10 steps smaller than 5%?”. Probabilistic Computation Tree Logic (PCTL) [7] 
allows for probabilistic quantification of properties. The syntax of PCTL includes 
state formulas and path formulas [13]. A state formula ¢ asserts property of a 
single state s € S whereas a path formula w asserts property of a trajectory. 


$:— true | li | ad; | Pi ^ 6; | Pup [V] (2) 
V:u- Xo | d,US*d2 | Ug (3) 


where l; is atomic proposition and 4,9; are state formulas; x € {<,>,<, >}; 
Py p* [v] means that the probability of generating a trajectory that satisfies for- 
mula Yis X p*. X ó asserts that the next state after initial state in the trajectory 
satisfies à; $4 US* ¢2 asserts that $» is satisfied in at most k transitions and all 
preceding states satisfy $1; 6; U ¢2 asserts that $2 will be eventually satisfied 
and all preceding states satisfy $41. The semantics of PCTL is defined by a sat- 
isfaction relation |= as follows. 


s E true iff states € S (4) 
s E œ iff state s satisfies the state formula ¢ (5) 
TE w iff trajectory 7 satisfies the path formula w. (6) 


Additionally, Emin denotes the minimal satisfaction relation [6] between + 
and v. Defining pref(r) as the set of all prefixes of trajectory 7 including T 
itself, then T Emin Y iff (r E v) ^(Vr' € pref(r)Nr, Tr E w). For instance, 
if Y = $4 US" go, then for any finite trajectory T LEmin pU", only the 
final state in 7 satisfies ¢2. Let P(T) be the probability of transitioning along a 
trajectory 7 and let Ty be the set of all finite trajectories that satisfy T Emin v, 
the value of PCTL property ~ is defined as P_2),,[¥] = X P(r). For a DTMC 

Tel 


M; and a state formula ¢ = P<p [Y], Mr F ¢ iff P-s.,[v] € p*. 

A countererample of ó is a set cex C Ij that satisfies $^ P(T) > p*. 

TEcex 
Let P(T) = Y, P(r) be the sum of probabilities of all trajectories in a set I. 
Ter 

Let CEX¢ C 21v be the set of all counterexamples for a formula ¢ such that 
(Veer € CEXy,P(cex) > p*) and (VI € 21" 4CEX,,P(I) < p*). A minimal 
counterexample is a set cer € CEX, such that Vcer' € CEX,,|cex| < |cex’|. 
By converting DTMC Mj into a weighted directed graph, counterexample can 
be found by solving a k-shortest paths (KSP) problem or a hop-constrained 
KSP (HKSP) problem [6]. Alternatively, counterexamples can be found by using 
Satisfiability Modulo Theory solving or mixed integer linear programming to 
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determine the minimal critical subsystems that capture the counterexamples in 
M, [23]. 
A policy can also be synthesized by solving the objective min P-;[v] for an 
T 
MDP M. This problem can be solved by linear programming or policy iteration 
(and value iteration for step-bounded reachability) [14]. 


3 Problem Formulation and Overview 


Suppose there are some unsafe states in an MDP\RM = (S, A, T,vy,so). A 
safety issue in apprenticeship learning means that an agent following the learnt 
policy would have a higher probability of entering those unsafe states than it 
should. There are multiple reasons that can give rise to this issue. First, it 
is possible that the expert policy mp itself has a high probability of reaching 
the unsafe states. Second, human experts often tend to perform only successful 
demonstrations that do not highlight the unwanted situations [21]. This lack of 
negative examples in the training set can cause the learning agent to be unaware 
of the existence of those unsafe states. 


Fig. 1. The 8 x 8 grid-world. (a) Lighter grid cells have higher rewards than the darker 
ones. The two black grid cells have the lowest rewards, while the two white ones have 
the highest rewards. The grid cells enclosed by red lines are considered unsafe. (b) 
The blue line is an example trajectory demonstrated by the expert. (c) Only the goal 
states are assigned high rewards and there is little difference between the unsafe states 
and their nearby states. As a result, the learnt policy has a high probability of passing 
through the unsafe states as indicated by the cyan line. (d) p* — 2096. The learnt policy 
is optimal to a reward function that correctly assigns low rewards to the unsafe states. 
(Color figure online) 


We use a 8 x 8 grid-world navigation example as shown in Fig. 1 to illustrate 
this problem. An agent starts from the upper-left corner and moves from cell to 
cell until it reaches the lower-right corner. The ‘unsafe’ cells are enclosed by the 
red lines. These represent regions that the agent should avoid. In each step, the 
agent can choose to stay in current cell or move to an adjacent cell but with 2096 
chance of moving randomly instead of following its decision. The goal area, the 
unsafe area and the reward mapping for all states are shown in Fig.1(a). For 
each state s € S, its feature vector consists of 4 radial basis feature functions 
with respect to the squared Euclidean distances between s and the 4 states with 
the highest or lowest rewards as shown in Fig. 1(a). In addition, a specification 
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® formalized in PCTL is used to capture the safety requirement. In (7), p* is 
the required upper bound of the probability of reaching an unsafe state within 
t = 64 steps. 

@::= Pcp [true UŚ‘ unsafe] (7) 


Let mg be the optimal policy under the reward map shown in Fig. 1(a). The 
probability of entering an unsafe region within 64 steps by following 7g is 24.6%. 
Now consider the scenario where the expert performs a number of demonstra- 
tions by following mtp. All demonstrated trajectories in this case successfully reach 
the goal areas without ever passing through any of the unsafe regions. Figure 1(b) 
shows a representative trajectory (in blue) among 10, 000 such demonstrated tra- 
jectories. The resulting reward map by running the AL algorithm on these 10,000 
demonstrations is shown in Fig. 1(c). Observe that only the goal area has been 
learnt whereas the agent is oblivious to the unsafe regions (treating them in the 
same way as other dark cells). In fact, the probability of reaching an unsafe state 
within 64 steps with this policy turns out to be 82.6% (thus violating the safety 
requirement by a large margin). To make matters worse, the value of p* may 
be decided or revised after a policy has been learnt. In those cases, even the 
original expert policy mp may be unsafe, e.g., when p* = 20%. Thus, we need to 
adapt the original AL algorithm so that it will take into account of such safety 
requirement. Figure 1(d) shows the resulting reward map learned using our pro- 
posed algorithm (to be described in detail later) for p* = 20%. It clearly matches 
well with the color differentiation in the original reward map and captures both 
the goal states and the unsafe regions. This policy has an unsafe probability of 
19.0%. We are now ready to state our problem. 


Definition 1. The safety-aware apprenticeship learning (SafeAL) 
problem is, given an MDP\R, a set of m trajectories {7,71,---,;Tm—1} demon- 
strated by an expert, and a specification ®, to learn a policy m that satisfies P 
and is e-close to the expert policy vg. 


Remark 1. We note that a solution may not always exist for the SafeAL problem. 
While the decision problem of checking whether a solution exists is of theoretical 
interest, in this paper, we focus on tackling the problem of finding a policy 7 
that satisfies a PCTL formula ® (if is satisfiable) and whose performance is as 
close to that of the expert’s as possible, i.e. we relax the condition on ur being 
e-close to ug. 


4 A Framework for Safety-Aware Learning 


In this section, we describe a general framework for safety-aware learning. This 
novel framework utilizes information from both the expert demonstrations and 
a verifier. The proposed framework is illustrated in Fig. 2. Similar to the coun- 
tererample-guided inductive synthesis (CEGIS) paradigm [22], our framework 
consists of a verifier and a learner. The verifier checks if a candidate policy sat- 
isfies the safety specification 9. In case ® is not satisfied, the verifier generates a 
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counterexample for &. The main difference from CEGIS is that our framework 
considers not only functional correctness, e.g., safety, but also performance (as 
captured by the learning objective). Starting from an initial policy 7, each time 
the learner learns a new policy, the verifier checks if the specification is satis- 
fied. If true, then this policy is added to the candidate set, otherwise the verifier 
will generate a (minimal) counterexample and add it to the counterexample set. 
During the learning phase, the learner uses both the counterexample set and 
candidate set to find a policy that is close to the (unknown) expert policy and 
far away from the counterexamples. The goal is to find a policy that is e-close 
to the expert policy and satisfies the specification. For the grid-world example 
introduced in Sect.3, when p* — 596 (thus presenting a stricter safety require- 
ment compared to the expert policy mg), our approach produces a policy with 
only 4.296 of reaching an unsafe state within 64 steps (with the correspondingly 
inferred reward mapping shown in Fig. 1(d)). 


Initialize 
T Verifier 
E Property Satisfy true 
. Specification? 
o Checking pecification 
——À 
false 
Find Counterexample — — — —— 
Ti cex 
* 
Counterexample Set Meet Learning true T 
€ (cexi, Cex2, cexs, ...) Objective? 
aai false 
i Add 7; 
1 Candidate Set 
Policy Search *—7//. , re” 
{m0,™1,-.-} 
Example Trajectories Learner 


Fig. 2. Our safety-aware learning framework. Given an initial policy 70, a specification 
@ and a learning objective (as captured by c), the framework iterates between a verifier 
and a learner to search for a policy 7* that satisfies both ® and e. One invariant that 
this framework maintains is that all the 7;'s in the candidate policy set satisfy 4. 


Learning from a (minimal) counterexample cex, of a policy 7 is similar to 
learning from expert demonstrations. The basic principle of the AL algorithm 
proposed in [1] is to find a weight vector w under which the expected reward of 
Te maximally outperforms any mixture of the policies in the candidate policy set 
II = (10,71,72,...]. Thus, w can be viewed as the normal vector of the hyper- 
plane wT (u— pg) = 0 that has the maximal distance to the convex hull of the set 
[us | a € IT} as illustrated in the 2D feature space in Fig.3(a). It can be shown 


Safety-Aware Apprenticeship Learning 669 


amine Gis 
DLL s max min w" (ug — ug) iis oe 
Pe Hi e Hg 
° e «. 
H n e" 
noe t 
E @ o'(qu-u)-0 e nj 
e@ s m e © @ 
"uu © ^w : pe n om MS 
7 1 
i h i max min, ( y cens u H 
n 1X entente cg duh cex. cex, Cex, 
(a) (b) 


Fig. 3. (a) Learn from expert. (b) Learn from both expert demonstrations and coun- 
terexamples. 


that wT ur > wT ux for all previously found 7's. Intuitively, this helps to move 
the candidate ur closer to wz. Similarly, we can apply the same max-margin sep- 
aration principle to maximize the distance between the candidate policies and 
the counterexamples (in the u space). Let CEX = (cezo, cex1, cex2, ...} denote 
the set of counterexamples of the policies that do not satisfy the specification ©. 
Maximizing the distance between the convex hulls of the sets {ficex|cew € CEX) 
and {ur |m € I} is equivalent to maximizing the distance between the paral- 
lel supporting hyperplanes of the two convex hulls as shown in Fig.3(b). The 
corresponding optimization function is given in Eq. (8). 


ô = max min ww" (Hg = peer) s.t. ||w|l2 € 1 (8) 


w mCll,cexcCEX 


To attain good performance similar to that of the expert, we still want to 
learn from jig. Thus, the overall problem can be formulated as a multi-objective 
optimization problem that combines (1) and (8) into (9). 


max (W (ug — Hr), w (uz —Meex)) s-t. lwll: S1 (9) 


min 
w mcll,zcll,cezxc CEX 


5 Counterexample-Guided Apprenticeship Learning 


In this section, we introduce the CounterExample Guided Apprenticeship Learn- 
ing (CEGAL) algorithm to solve the SafeAL problem. It can be viewed as a 
special case of the safety-aware learning framework described in the previous 
section. In addition to combining policy verification, counterexample generation 
and AL, our approach uses an adaptive weighting scheme to weight the separa- 
tion from ug with the separation from Heer- 


wT (k(ug jin) + (1— k) (ua — Hcer)) (10) 


max min 
w wells, 7E€lls,cereCExX 


s.t. |lw||o € 1, k € [0,1] 
w (ug E Lr) < w (ug = Ur’), Yr’ € Is 
wT (ua — heer) € wT (ua: — heer’), Vit’ € IIg, Vcezx' € CEX 
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In essence, we take a weighted-sum approach for solving the multi-objective 
optimization problem (9). Assuming that Hs = {71,72,73,...} is a set of can- 
didate policies that all satisfy 6, CEX = {cex,, cex2, cex3,...} is a set of coun- 
terexamples. We introduce a parameter k and change (9) into a weighted sum 
optimization problem (10). Note that 7 and 7 can be different. The optimal 
w solved from (10) can be used to generate a new policy 7, by using algo- 
rithms such as policy iteration. We use a probabilistic model checker, such as 
PRISM [13], to check if m, satisfies 4. If it does, then it will be added to IIs. 
Otherwise, a counterexample generator, such as COMICS [9], is used to generate 
a (minimal) counterexample cez;,, which will be added to CEX. 


Algorithm 1. Counterexample-Guided Apprenticeship Learning (CEGAL) 


1: Input: 
2: M — A partially known MDP\R; f — A vector of feature functions 
3: pe +— The expected features of expert trajectories (70, 71,..., Tm} 
4: P — Specification; e — Error bound for the expected features; 
5: o,a € (0,1) — Error bound c and step length a for the parameter k; 
6: Initialization: 
T: If ||ug — pro |l2 € €, then return ro > 70 is the initial safe policy 
8: Ils — {mo}, CEX — 0 > Initialize candidate and counterexample set 
9: inf — 0, sup — l,k — sup > Initialize multi-optimization parameter k 
10: 71 — Policy learnt from upg via apprenticeship learning 
11: Iteration i (i > 1): 
12: Verifier: 
13: status — Model Checker(M, ni, 4) 
14: If status — SAT, then go to Learner 
15: If status = UNSAT 
16: Cer; —— Counterexample_Generator(M, vi, d) 
LT: Add cer, to CE X and solve Hcezz; ; ZO to Learner 
18: Learner: 
19: If status — SAT 
20: If ||ug — pa;||2 € €, then return «* — mi 
21: > Terminate. m; is e-close to v E 
22: Add mi to Hs, inf — k, k — sup » Update //s, inf and reset k 
23: If status = UNSAT 
24: If |k — inf| € c, then return 7* — argmin||ug — ua ||2 
wells 
25: > Terminate. k is too close to its lower bound. 
26: k — a- inf 4 (1— o)k > Decrease k to learn for safety 
27: Witi c argmaz (mimo ox w” (k(ug — ua) + (1 — k) (uz — ueez)) 
28: > Note that the multi-objective optimization function recovers AL when k = 1 
29: Ti+1, Hr;}ı — Compute the optimal policy v;41 and its expected features 
Lis; ,, for the MDP M with reward R(s) = wi f(s) 
30: Go to next iteration 


Algorithm 1 describes CEGAL in detail. With a constant sup — 1 and a 
variable inf € [0, sup] for the upper and lower bounds respectively, the learner 
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determines the value of k within [inf, sup] in each iteration depending on the 
outcome of the verifier and uses k in solving (10) in line 27. Like most nonlinear 
optimization algorithms, this algorithm requires an initial guess, which is an 
initial safe policy zo to make Is nonempty. A good initial candidate would 
be the maximally safe policy for example obtained using PRISM-games [15]. 
Without loss of generality, we assume this policy satisfies &. Suppose in iteration 
i, an intermediate policy 7; learnt by the learner in iteration i — 1 is verified to 
satisfy ®, then we increase inf to inf = k and reset k to k = sup as shown in 
line 22. If m; does not satisfy 6, then we reduce k tok = a- inf + (1 — o)k as 
shown in line 26 where a € (0,1) is a step length parameter. If |k — inf| < o 
and 7; still does not satisfy 9, the algorithm chooses from ITs a best safe policy 
m* which has the smallest margin to 7g as shown in line 24. If 7; satisfies 9 and 
is e-close to 7g, the algorithm outputs 7; as show in line 19. For the occasions 
when 7; satisfies ® and inf = sup = k = 1, solving (10) is equivalent to solving 
(1) as in the original AL algorithm. 


Remark 2. The initial policy mo does not have to be maximally safe, although 
such a policy can be used to verify if 9 is satisfiable at all. Naively safe policies 
often suffice for obtaining a safe and performant output at the end. Such a policy 
can be obtained easily in many settings, e.g., in the grid-world example one safe 
policy is simply staying in the initial cell. In both cases, mo typically has very 
low performance since satisfying ® is the only requirement for it. 


Theorem 1. Given an initial policy xo that satisfies P, Algorithm 1 is guar- 
anteed to output a policy m*, such that (1) m* satisfies ®, and (2) the per- 
formance of m* is at least as good as that of m when compared to mp, i.e. 


lug — talla S ng — taolla- 


Proof Sketch. The first part of the guarantee can be proven by case splitting. 
Algorithm 1 outputs 7* either when 7* satisfies 9 and is e-close to mpg, or when 
|k — inf| € o in some iteration. In the first case, 7* clearly satisfies 9. In the 
second case, m“ is selected from the set Js which contains all the policies that 
have been found to satisfy ® so far, so 7" satisfies 9. For the second part of the 
guarantee, the initial policy mo is the final output 7* if mo satisfies 9 and is e- 
close to mg. Otherwise, To is added to Is if it satisfies 9. During the iteration, if 
|k —inf| € o in some iteration, then the final output is 7* = argmin||ug — us |l; 
TEs 
so it must satisfy || uge — Hr*||2 < || uE — Hro l|2. If a learnt policy 7* satisfies 8 and 
is e-close to 7g, then Algorithm 1 outputs 7* without adding it to Ms. Obviously 


ue — Uxll2 > € Vm € Hs, so |ue — ua-ll2 < lug — Hroll2- 


Discussion. In the worst case, CEGAL will return the initial safe policy. However, 
this can be because a policy that simultaneously satisfies ® and is e-close to 
the expert's demonstrations does not exist. Comparing to AL which offers no 
safety guarantee and finding the maximally safe policy which has very poor 
performance, CEGAL provides a principled way of guaranteeing safety while 
retaining performance. 
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Convergence. Algorithm 1 is guaranteed to terminate. Let in f; be the tt” assigned 
value of inf. After inf, is given, k is decreased from kọ = sup iteratively by 
ki = a- inf, + (1 — o)k; , until either |k; — inf;| € o in line 24 or a new safe 
policy is found in line 18. The update of k satisfies the following equality. 


|a = in fal E Qa: inf, + (1 = a)k; zn in f, 


=l-a (11) 


Thus, it takes no more than 1 + log;_,. 777-5. iterations for either the 
algorithm to terminate in line 24 or a new safe policy to be found in line 18. Ifa 
new safe policy is found in line 18, inf will be assigned in line 22 by the current 
value of k as inft41 = k which obviously satisfies in ft+1 — infi > (1—a)o. After 
the assignment of in ft+1, the iterative update of k resumes. Since sup— inf, < 1, 
the following inequality holds. 


lin fir — sup| — sup — infi — (1 — o) 
linf,— sup| 7 sup — inf, 


<1-(l-ajo (12) 

Obviously, starting from an initial inf = info < sup, with the alternating 
update of inf and k, inf will keep getting close to sup unless the algorithm 
terminates as in line 24 or a safe policy e-close to 7g is found as in line 19. The 
extreme case is that finally inf = sup after no more than sup into updates on 
inf. Then, the problem becomes AL. Therefore, the worst case of this algorithm 
can have two phases. In the first phase, inf increases from inf = 0 to inf = sup. 
Between each two consecutive updates (t,t + 1) on inf, there are no more than 
logi. I. updates on k before in f is increased from inf; to in ft+1. Overall, 
this phase takes no more than 


(l-a)o (1— a)o 

D mer eo n ty, 
* sup — info — i- (1 — a)o *1—i. (1-a) 

0<i< spito p fo ( ) hike i ( ) 


(13) 
iterations to reduce the multi-objective optimization problem to original appren- 
ticeship learning and then the second phase begins. Since k = sup, the iteration 
will stop immediately when an unsafe policy is learnt as in line 24. This phase 
will not take more iterations than original AL algorithm does to converge and 
the convergence result of AL is given in [1]. 

In each iteration, the algorithm first solves a second-order cone program- 
ming (SOCP) problem (10) to learn a policy. SOCP problems can be solved in 
polynomial time by interior-point (IP) methods [12]. PCTL model checking for 
DTMCs can be solved in time linear in the size of the formula and polynomial in 
the size of the state space [7]. Counterexample generation can be done either by 
enumerating paths using the k-shortest path algorithm or determining a critical 
subsystem using either a SMT formulation or mixed integer linear programming 
(MILP) [23]. For the k-shortest path-based algorithm, it can be computationally 
expensive sometimes to enumerate a large amount of paths (i.e. a large k) when 
p* is large. This can be alleviated by using a smaller p* during calculation, which 
is equivalent to considering only paths that have high probabilities. 
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6 Experiments 


We evaluate our algorithm on three case studies: (1) grid-world, (2) cart-pole, 
and (3) mountain-car. The cart-pole environment! and the mountain-car envi- 
ronment” are obtained from OpenAI Gym. All experiments are carried out 
on a quad-core i7-7700K processor running at 3.6GHz with 16GB of mem- 
ory. Our prototype tool was implemented in Python?. The parameters are 
y = 0.99,e = 10,0 = 10-°,a = 0.5 and the maximum number of iterations 
is 50. For the OpenAI-gym experiments, in each step, the agent sends an action 
to the OpenAI environment and the environment returns an observation and a 
reward (0 or 1). We show that our algorithm can guarantee safety while retaining 
the performance of the learnt policy compared with using AL alone. 


6.1 Grid World 


We first evaluate the scalability of our tool using the grid-world example. Table 1 
shows the average runtime (per iteration) for the individual components of our 
tool as the size of the grid-world increases. The first and second columns indicate 
the size of the grid world and the resulting state space. The third column shows 
the average runtime that policy iteration takes to compute an optimal policy 7 
for a known reward function. The forth column indicates the average runtime 
that policy iteration takes to compute the expected features u for a known policy. 
'The fifth column indicates the average runtime of verifying the PCTL formula 
using PRISM. The last column indicates the average runtime that generating a 
counterexample using COMICS. 


Table 1. Average runtime per iteration in seconds. 


Size Num. of states | Compute m | Compute u| MC Cex 
8x8 64 0.02 0.02 1.39 |0.014 
16x16, 256 0.05 0.05 1.43 |0.014 
32 x 32 | 1024 0.07 0.08 3.12 | 0.035 
64 x 64 | 4096 6.52 25.88 22.877 | 1.59 


6.2 Cart-Pole from OpenAI Gym 


In the cart-pole environment as shown in Fig. 4(a), the goal is to keep the pole 
on a cart from falling over as long as possible by moving the cart either to the 
left or to the right in each time step. The maximum step length is t = 200. The 


! https: //github.com/openai/gym/wiki/CartPole-v0. 
? https: //github.com/openai/gym/wiki/MountainCar-v0. 
3 https: //github.com/zwc662/CAV2018. 
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position, velocity and angle of the cart and the pole are continuous values and 
observable, but the actual dynamics of the system are unknown’. 


-2.4 0.0 24 -2.4 -0.3 24 -2.4 0.3 2.4 
(a) (b) (c) 


Fig. 4. (a) The cart-pole environment. (b) The cart is at —0.3 and pole angle is —20°. 
(c) The cart is at 0.3 and pole angle is 20°. 


A maneuver is deemed unsafe if the pole angle is larger than +20° while the 
cart’s horizontal position is more than 0.3 as shown in Fig. 4(b) and (c). We 
formalize the safety requirement in PCTL as (14). 


@:= Pz [true UŚ! (angle < —20° A position < —0.3) 
V(angle > 20° A position > 0.3)] (14) 


Table 2. In the cart-pole environment, higher average steps mean better performance. 
The safest policy is synthesized using PRISM-games. 


MC Result | Avg. Steps Num. of Iters 

AL 49.1% 165 2 

Safest Policy| 0.0% 8 N.A. 

p* = 30% 17.2% 121 10 

p' — 2596 9.396 136 14 

p“ = 2096 17.2% 122 10 

p* =15% 6.9% 118 22 

p* =10% 7.2% 136 22 

p* =5% 0.04% 83 50 


We used 2000 demonstrations for which the pole is held upright without vio- 
lating any of the safety conditions for all 200 steps in each demonstration. The 
safest policy synthesized by PRISM-games is used as the initial safe policy. We 
also compare the different policies learned by CEGAL for different safety thresh- 
old p*s. In Table 2, the policies are compared in terms of model checking results 


^ The MDP is built from sampled data. The feature vector in each state contains 
30 radial basis functions which depend on the squared Euclidean distances between 
current state and other 30 states which are uniformly distributed in the state space. 
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(“MC Result’) on the PCTL property in (14) using the constructed MDP, the 
average steps (Avg. Steps’) that a policy (executed in the OpenAI environment) 
can hold across 5000 rounds (the higher the better), and the number of iterations 
(*Num. of Iters’) it takes for the algorithm to terminate (either converge to an 
e-close policy, or terminate due to c, or terminate after 50 iterations). The policy 
in the first row is the result of using AL alone, which has the best performance 
but also a 49.1% probability of violating the safety requirement. The safest pol- 
icy as shown in the second row is always safe has almost no performance at all. 
'This policy simply letts the pole fall and thus does not risk moving the cart out 
of the range [—0.3,0.3]. On the other hand, it is clear that the policies learnt 
using CEGAL always satisfy the safety requirement. From p* — 3096 to 1096, the 
performance of the learnt policy is comparable to that of the AL policy. How- 
ever, when the safety threshold becomes very low, e.g., p* — 596, the performance 
of the learnt policy drops significantly. This reflects the phenomenon that the 
tighter the safety condition is the less room for the agent to maneuver to achieve 
a good performance. 


6.3 Mountain-Car from OpenAI Gym 


Our third experiment uses the mountain-car environment from OpenAI Gym. 
As shown in Fig.5(a), a car starts from the bottom of the valley and tries to 
reach the mountaintop on the right as quickly as possible. In each time step 
the car can perform one of the three actions, accelerating to the left, coasting, 
and accelerating to the right. The agent fails if the step length reaches the 
maximum (t — 66). The velocity and position of the car are continuous values 
and observable while the exact dynamics are unknown?. In this game setting, the 
car cannot reach the right mountaintop by simply accelerating to the right. It 
has to accumulate momentum first by moving back and forth in the valley. The 
safety rules we enforce are shown in Fig.5(b). They correspond to speed limits 
when the car is close to the left mountaintop or to the right mountaintop (in 
case it is a cliff on the other side of the mountaintop). Similar to the previous 
experiments, we considered 2000 expert demonstrations for which all of them 
successfully reach the right mountaintop without violating any of the safety 
conditions. The average number of steps for the expert to drive the car to the 
right mountaintop is 40. We formalize the safety requirement in PCTL as (15). 


@::= Pep [true UX! (speed € —0.04 ^ position < —1.1) 
V(speed > 0.04 ^ position > 0.5)] (15) 


We compare the different policies using the same set of categories as in the 
cart-pole example. The numbers are averaged over 5000 runs. As shown in the 


5 The MDP is built from sampled data. The feature vector for each state contains 2 
exponential functions and 18 radial basis functions which respectively depend on the 
squared Euclidean distances between the current state and other 18 states which are 
uniformly distributed in the state space. 
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Fig. 5. (a) The original mountain-car environment. (b) The mountain-car environment 
with traffic rules: when the distance from the car to the left edge or the right edge is 
shorter than 0.1, the speed of the car should be lower than 0.04. 


first row, the policy learnt via AL® has the highest probability of going over 
the speed limits. We observed that this policy made the car speed up all the 
way to the left mountaintop to maximize its potential energy. The safest policy 
corresponds to simply staying in the bottom of the valley. The policies learnt 
via CEGAL for safety threshold p* ranging from 6096 to 5096 not only have 
lower probability of violating the speed limits but also achieve comparable per- 
formance. As the safety threshold p* decreases further, the agent becomes more 
conservative and it takes more time for the car to finish the task. For p* = 20%, 
the agent never succeeds in reaching the top within 66 steps (Table 3). 


Table 3. In the mountain-car environment, lower average steps mean better perfor- 
mance. The safest policy is synthesized via PRISM-games. 


MC Result | Avg. steps | Num. of Iters 

Policy Learnt via AL | 69.296 54 50 

Safest Policy 0.096 Fail N.A. 

p* =60% 43.4% 57 9 

p* — 5096 47.2% 55 17 

p* =40% 29.3% 61 26 

p* =30% 18.9% 64 17 

p* = 2096 4.996 Fail 40 


7 Related Work 


A taxonomy of AI safety problems is given in [3] where the issues of misspecified 
objective or reward and insufficient or poorly curated training data are high- 
lighted. There have been several attempts to address these issues from different 
angles. The problem of safe exploration is studied in [8,17]. In particular, the 
latter work proposes to add a safety constraint, which is evaluated by amount 


$ AL did not converge to an e-close policy in 50 iterations in this case. 
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of damage, to the optimization problem so that the optimal policy can maxi- 
mize the return without violating the limit on the expected damage. An obvious 
shortcoming of this approach is that actual failures will have to occur to properly 
assess damage. 

Formal methods have been applied to the problem of AI safety. In [5], 
the authors propose to combine machine learning and reachability analysis for 
dynamical models to achieve high performance and guarantee safety. In this 
work, we focus on probabilistic models which are natural in many modern 
machine learning methods. In [20], the authors propose to use formal specifi- 
cation to synthesize a control policy for reinforcement learning. They consider 
formal specifications captured in Linear Temporal Logic, whereas we consider 
PCTL which matches better with the underlying probabilistic model. Recently, 
the problem of safe reinforcement learning was explored in [2] where à moni- 
tor (called shield) is used to enforce temporal logic properties either during the 
learning phase or execution phase of the reinforcement learning algorithm. The 
shield provides a list of safe actions each time the agent makes a decision so that 
the temporal property is preserved. In [11], the authors also propose an approach 
for controller synthesis in reinforcement learning. In this case, an SMT-solver is 
used to find a scheduler (policy) for the synchronous product of an MDP and 
a DTMC so that it satisfies both a probabilistic reachability property and an 
expected cost property. Another approach that leverages PCTL model checking 
is proposed in [16]. A so-called abstract Markov decision process (AMDP) model 
of the environment is first built and PCTL model checking is then used to check 
the satisfiability of safety specification. Our work is similar to these in spirit in 
the application of formal methods. However, while the concept of AL is closely 
related to reinforcement learning, an agent in the AL paradigm needs to learn a 
policy from demonstrations without knowing the reward function a priori. 

A distinguishing characteristic of our method is the tight integration of for- 
mal verification with learning from data (apprenticeship learning in particular). 
Among imitation or apprenticeship learning methods, margin based algorithms 
[1,18, 19] try to maximize the margin between the expert’s policy and all learnt 
policies until the one with the smallest margin is produced. The apprenticeship 
learning algorithm proposed by Abbeel and Ng [1] was largely motivated by the 
support vector machine (SVM) in that features of expert demonstration is max- 
imally separately from all features of all other candidate policies. Our algorithm 
makes use of this observation when using counterexamples to steer the policy 
search process. Recently, the idea of learning from failed demonstrations started 
to emerge. In [21], the authors propose an IRL algorithm that can learn from 
both successful and failed demonstrations. It is done by reformulating maximum 
entropy algorithm in [24] to find a policy that maximally deviates from the failed 
demonstrations while approaching the successful ones as much as possible. How- 
ever, this entropy-based method requires obtaining many failed demonstrations 
and can be very costly in practice. 

Finally, our approach is inspired by the work on formal inductive synthe- 
sis [10] and counterexample-guided inductive synthesis (CEGIS) [22]. These 
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frameworks typically combine a constraint-based synthesizer with a verification 
oracle. In each iteration, the agent refines her hypothesis (i.e. generates a new 
candidate solution) based on counterexamples provided by the oracle. Our app- 
roach can be viewed as an extension of CEGIS where the objective is not just 
functional correctness but also meeting certain learning criteria. 


8 Conclusion and Future Work 


We propose a counterexample-guided approach for combining probabilistic 
model checking with apprenticeship learning to ensure safety of the appren- 
ticehsip learning outcome. Our approach makes novel use of counterexamples 
to steer the policy search process by reformulating the feature matching prob- 
lem into a multi-objective optimization problem that additionally takes safety 
into account. Our experiments indicate that the proposed approach can guar- 
antee safety and retain performance for a set of benchmarks including examples 
drawn from OpenAI Gym. In the future, we would like to explore other imita- 
tion or apprenticeship learning algorithms and extend our techniques to those 
settings. 
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Abstract. Probabilistic bisimilarity is an equivalence relation that cap- 
tures which states of a labelled Markov chain behave the same. Since this 
behavioural equivalence only identifies states that transition to states 
that behave exactly the same with exactly the same probability, this 
notion of equivalence is not robust. Probabilistic bisimilarity distances 
provide a quantitative generalization of probabilistic bisimilarity. The 
distance of states captures the similarity of their behaviour. The smaller 
the distance, the more alike the states behave. In particular, states are 
probabilistic bisimilar if and only if their distance is zero. This quantita- 
tive notion is robust in that small changes in the transition probabilities 
result in small changes in the distances. 

During the last decade, several algorithms have been proposed to 
approximate and compute the probabilistic bisimilarity distances. The 
main result of this paper is an algorithm that decides distance one in 
O(n? + m?), where n is the number of states and m is the number of 
transitions of the labelled Markov chain. The algorithm is the key new 
ingredient of our algorithm to compute the distances. The state of the art 
algorithm can compute distances for labelled Markov chains up to 150 
states. For one such labelled Markov chain, that algorithm takes more 
than 49h. In contrast, our new algorithm only takes 13 ms. Further- 
more, our algorithm can compute distances for labelled Markov chains 
with more than 10,000 states in less than 50 min. 


Keywords: Labelled Markov chain - Probabilistic bisimilarity 
Probabilistic bisimilarity distance 


1 Introduction 


A behavioural equivalence captures which states of a model give rise to the same 
behaviour. Bisimilarity, due to Milner [22] and Park [25], is one of the best 
known behavioural equivalences. Verifying that an implementation satisfies a 
specification boils down to checking that the model of the implementation gives 
rise to the same behaviour as the model of the specification, that is, the models 
are behavioural equivalent (see [1, Chap. 3]). 

In this paper, we focus on models of probabilistic systems. These models can 
capture randomized algorithms, probabilistic protocols, biological systems and 
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many other systems in which probabilities play a central role. In particular, we 
consider labelled Markov chains, that is, Markov chains the states of which are 
labelled. 


The above example shows how the behaviour of rolling a die can be mimicked 
by flipping a coin, an example due to Knuth and Yao [19]. Six of the states 
are labelled with the values of a die and the other states are labelled zero. In 
this example, we are interested in the labels representing the value of a die. 
As the reader can easily verify, the states with these labels are each reached 
with probability à from the initial, top most, state. In general, labels are used 
to identify particular states that have properties of interest. As a consequence, 
states with different labels are not behaviourally equivalent. 

Probabilistic bisimilarity, due to Larsen and Skou [21], is a key behavioural 
equivalence for labelled Markov chains. As shown by Katoen et al. [16], mini- 
mizing a labelled Markov chain by identifying those states that are probabilis- 
tic bisimilar speeds up model checking. Probabilistic bisimilarity only identifies 
those states that behave exactly the same with exactly the same probability. If, 
for example, we replace the fair coin in the above example with a biased one, 
then none of the states labelled with zero in the original model with the fair coin 
are behaviourally equivalent to any of the states labelled with zero in the model 
with the biased coin. Behavioural equivalences like probabilistic bisimilarity rely 
on the transition probabilities and, as a result, are sensitive to minor changes 
of those probabilities. That is, such behavioural equivalences are not robust, as 
first observed by Giacalone et al. [12]. 

The probabilistic bisimilarity distances that we study in this paper were first 
defined by Desharnais et al. in [11]. Each pair of states of a labelled Markov 
chain is assigned a distance, a real number in the unit interval [0, 1]. This dis- 
tance captures the similarity of the behaviour of the states. The smaller the 
distance, the more alike the states behave. In particular, states have distance 
zero if and only if they are probabilistic bisimilar. This provides a quantitative 
generalization of probabilistic bisimilarity that is robust in that small changes 
in the transition probabilities give rise to small changes in the distances. For 
example, we can model a biased die by using a biased coin instead of a fair coin 
in the above example. Let us assume that the odds of heads of the biased coin, 


that is, going to the left, is 2L. A state labelled zero in the model of the fair die 
gomg T00 
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has a non-trivial distance, that is, a distance greater than zero and smaller than 
one, to the corresponding state in the model of the biased die. For example, the 
initial states have distance about 0.036. We refer the reader to [7] for a more 
detailed discussion of a similar example. 

As we already mentioned earlier, behavioural equivalences can be used to 
verify that an implementation satisfies a specification. Similarly, the distances 
can be used to check how similar an implementation is to a specification. We 
also mentioned that probabilistic bisimilarity can be used to speed up model 
checking. The distances can be used in a similar way, by identifying those states 
that behave almost the same, that is, have a small distance (see [3, 23, 26]). 

We focus in this paper on computing the probabilistic bisimilarity distances. 
In particular, we present a decision procedure for distance one. That is, we com- 
pute the set of pairs of states that have distance one. Recall that distance one 
is the maximal distance and, therefore, captures that states behave very differ- 
ently. States with different labels have distance one. However, also states with 
the same label can have distance one, as the next example illustrates. 


® OH 
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Instead of computing the set of state pairs that have distance one, we compute 
the complement, that is, the set of state pairs with distance smaller than one. 
Obviously, the set of state pairs with distance zero is included in this set. First, 
we decide distance zero. As we mentioned earlier, distance zero coincides with 
probabilistic bisimilarity. The first decision procedure for probabilistic bisimi- 
larity was provided by Baier [4]. More efficient decision procedures were subse- 
quently proposed by Derisavi et al. [10] and also by Valmari and Franceschinis 
[30]. The latter two both run in O(mlogn), where n and m are the number 
of states and transitions of the labelled Markov chain. Subsequently, we use 
a traversal of a directed graph derived from the labelled Markov chain. This 
traversal takes O(n? + m?). 

The decision procedures for distance zero and one can be used to compute 
or approximate probabilistic bisimilarity distances as indicated below. 


Do 


Di 


few non-trivial distances many non-trivial distances 
SPI e. 


small distances approximate distances 


Q DI 


SPPI 


684 Q. Tang and F. van Breugel 


Once we have computed the sets Dp and D of state pairs that have distance 
zero or one, we can easily compute the number of state pairs with non-trivial 
distances. If the number of non-trivial distances is small, then we can use the 
simple policy iteration (SPI) algorithm due to Bacci et al. [2] to compute those 
distances. Otherwise, we can either compute all distances smaller than a chosen 
€ » 0 or we can approximate the distances up to some chosen accuracy o > 0. 
In the former case, we first compute a query set Q of state pairs that contains 
all state pairs the distances of which are at most £. Subsequently, we apply the 
simple partial policy iteration (SPPI) algorithm due to Bacci et al. [2] to compute 
the distances for all state pairs in Q. In the latter case, we start with a pair of 
distance functions, one being a lower-bound and the other being an upper-bound 
of the probabilistic bisimilarity distances, and iteratively improve the accuracy of 
those until they are a close. We call this new approximation algorithm distance 
iteration (DI) as it is similar in spirit to Bellman's value iteration [5]. 

Chen et al. [8] presented an algorithm to compute the distances by means of 
Khachiyan's ellipsoid method [17]. Though the algorithm is polynomial time, in 
practice it is not as efficient as the policy iteration algorithms (see the examples 
in [28, Sect. 8]). The state of the art algorithm to compute the probabilistic 
bisimilarity distances consists of two components: D, and SPI. To compare this 
algorithm with our new algorithm consisting of the components Do, D; and SPI, 
we implemented all the components in Java and ran both implementations on 
several labelled Markov chains. These labelled Markov chains model random- 
ized algorithms and probabilistic protocols that are part of the distribution of 
probabilistic model checkers such as PRISM [20]. Whereas the original state of 
the art algorithm can handle labelled Markov chains with up to 150 states, our 
new algorithm can handle more than 10,000 states. Furthermore, for one such 
labelled Markov chain with 150 states, the original algorithm takes more than 
49 h, whereas our new algorithm takes only 13 ms. Also, the new algorithm con- 
sisting of the components Do, Dı, Q and SPPI to compute only small distances 
along with the new algorithm consisting of the components Do, Dı and DI to 
approximate the distances give rise to even less execution times for a number of 
the labelled Markov chains. 

'The main contributions of this paper are 


— a polynomial decision procedure for distance one, 

— an algorithm to compute the probabilistic bisimilarity distances, 

— an algorithm to compute those probabilistic bisimilarity distances smaller 
than some given € > 0, and 

— an approximation algorithm to compute the probabilistic bisimilarity dis- 
tances up to some given accuracy a > 0. 


Furthermore, by means of experiments we have shown that these three new 
algorithms are very effective, improving significantly on the state of the art. 
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2 Labelled Markov Chains and Probabilistic Bisimilarity 
Distances 


We start by reviewing the model of interest, labelled Markov chains, its most 
well known behavioural equivalence, probabilistic bisimilarity due to Larsen and 
Skou [21], and the probabilistic bisimilarity pseudometric due to Desharnais et 
al. [11]. We denote the set of rational probability distributions on a set S by 
Distr(S). For u € Distr(S), its support is defined by support(u) = (s € S | 
u(s) » 0). Instead of S x S, we often write 9°. 


Definition 1. A labelled Markov chain is a tuple (S, L, v, £) consisting of 


- a nonempty finite set S of states, 

- a nonempty finite set L of labels, 

— a transition function T : S — Distr(S), and 
- a labelling function £: S — L. 


For the remainder of this section, we fix such a labelled Markov chain 
(S, Lr 8). 


Definition 2. Let u, v € Distr(S). The set Q(u,v) of couplings of u and v is 
defined by 


any | V$€5$:» .gw(s t) = pls) ^ 
Olu v) = Co € Dist? | vte 8:3 5 u(5, =o) i 


Note that w € f2(j,v) is a joint probability distribution with marginals p 
and v. The following proposition will be used to prove Proposition 5. 


Proposition 1. For all u, v € Distr(S) and X C S?, 
Vw € Q(u,v) : support(w) C X if and only if support(u) x support(v) C X. 


Definition 3. An equivalence relation R C S? is a probabilistic bisimulation 
if for all (s,t) € R, (s) = ((t) and there exists w € ((r(s),T(t)) such that 
support(w) C R. Probabilistic bisimilarity, denoted ~, is the largest probabilistic 
bisimulation. 


The probabilistic bisimilarity pseudometric of Desharnais et al. [11] maps 
each pair of states of a labelled Markov chain to a distance, an element of the 
unit interval [0, 1]. Hence, the pseudometric is a function from S? to [0, 1], that 
is, an element of [0, 1] S". As we will discuss below, it can be defined as a fixed 
point of the following function. 


Definition 4. The function A : [0, j? — [0, ys? is defined by 


1 if €(s) # L(t) 
A(d)(s,t) = min 5 w(u, v) d(u, v) otherwise 


wEQ(r(s),7(t)) usc 
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Since a concave function on a convex polytope attains its minimum (see [18, 
p. 260]), the above minimum exists. We will use this fact in Proposition 4, one 
of the key technical results in this paper. We endow the set [0, ys of functions 
from S? to [0,1] with the following partial order: d E e if d(s,t) < e(s,t) for 
all s, t € S. The set [0, 1] S? together with the order E form a complete lattice 
(see [9, Chap. 2]). The function A is monotone (see [6, Sect. 3]). According to 
the Knaster-Tarski fixed point theorem [29, Theorem 1], a monotone function 
on a complete lattice has a least fixed point. Hence, A has a least fixed point, 
which we denote by (A). This fixed point assigns to each pair of states their 
probabilistic bisimilarity distance. 

Given that (A) captures the probabilistic bisimilarity distances, we define 
the following sets. 


The probabilistic bisimilarity pseudometric (A) provides a quantitative gen- 
eralization of probabilistic bisimilarity as captured by the following result by 
Desharnais et al. [11, Theorem 1]. 


Theorem 1. Do = ((s,t) e S?| s~t}. 


3 Distance One 


We concluded the previous section with the characterization of Do as the set of 
state pairs that are probabilistic bisimilar. In this section we present a charac- 
terization of D, as a fixed point of the function introduced in Definition 5. 

Let us consider the case that the probabilistic bisimilarity distance of states s 
and t is one, that is, w(A)(s,t) = 1. Then A(u(A))(s,t) = 1. From the def- 
inition of A, we can conclude that either /(s) Z /(t), or for all couplings 
w € §2(7(s), T(t)) we have support(w) C Dj. 

We partition the set S$? of state pairs into 


S2 = {(s,t) e S? | s ~ t) 
St = {(s,t) € S? | (s) 4 (0) 
S? = S? \ (S6 U ST) 


3 ) 


Hence, if n(A)(s,t) = 
7(s), T(t) 


couplings w € f2( 
following function. 


1, then either (s,t) € S?, or (s,t) € S2 and for all 
) we have support(w) C Dı. This leads us to the 
Definition 5. The function I : 25° _, 25” is defined by 

T(X) = 8 U{(s,t) e S? | Vw € Q(r(s),v(t)) : support(w) C X ). 


Proposition 2. The function I' is monotone. 
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Since the set 25^ of subsets of S2 endowed with the order C is a complete 
lattice (see [9, Example 2.6(2)]) and the function I’ is monotone, we can conclude 
from the Knaster-Tarski fixed point theorem that I" has a greatest fixed point, 
which we denote by v(I’). Next, we show that Dj is a fixed point of I. 


Proposition 3. D; = I (Dı). 


Since we have already seen that D; is a fixed point of I’, we have that 
Dı C v(I). To conclude that D; is the greatest fixed point of I’, it remains to 
show that v(I') € Di, which is equivalent to the following. 


Proposition 4. v(I) VD; — (. 


Proof. Towards a contradiction, assume that v(I) V Di 4 0. Let 


m = min(u(A)(s.t) | (s,) € w()VDi) 
M ={(s,t) € w(D)VDi | (A) (s.t) = m) 


Since v(I’) V Dı z 0, we have that M z Ø. Furthermore, 
M C u(r) \ Di. (1) 
Since v(I’) V Di € V(I), we have 
M C w(D) = F'(v(7p)) € S105. (2) 
For all (s,t) € M, 


(s.t) € w(D)^(st) g Di (()] 
=> (s,t) e '(v(D) A (s,t) g S? 
=> Vw € Q(r(s), T(t)) : support(w) C v(I7). (3) 


For each (s, t) € M, let 


Wst = argmin 5 w(u, v) (A) (u, v). (4) 


wEN(T(s),T(t)) UvEes 


We distinguish the following two cases. 
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— Assume that there exists (s,t) € M such that support(ws +4) Dı 4 0. Let 


p= 5 ws lu, v). 
(u,v)ev(I)nDi 


By (3), we have that support(w,,) C V(I). Since support(v,,) à Dı z 0 
by assumption, we can conclude that p > 0. Again using the fact that 
support(ws 4) C v(I), we have that 


5 ws (u, v) =1- Dp. (5) 


(u,v)ev (DP)NDi 


Furthermore, 


= min 5 w(u, v) (A) (u,v) 


wEN(T(s),T(t)) vee 
= SO vsu v) uCA)(u v). [(4)] 
u,vES 
= J` ws(u,v)u(A)(lu, v) 3) 
(u,v)ev (T) 
= 5 Us (uU, v) p(A) (u, v) + 5 ws (u, v) e (u, v) 
(u,v)ev(I)nDi (u,v)ev (DP)VDi 


=pt o X esu) pla) v) 


(u,vy)ev (DP)NDi 
> p+(1—p)m. 


The last step follows from (5) and the fact that w(A)(u,v) > m for all 
(u,v) € V(I) \ Di. From the facts that p » 0 and m > p+ (1 — p)m we can 
conclude that m > 1. This contradicts (1). 

— Otherwise, support(ws) O Dı = 0 for all (s,t) € M. Next, we will show 

that M is a probabilistic bisimulation under this assumption. From the fact 
that M is a probabilistic bisimulation, we can conclude from Theorem 1 that 
u(A)(s, t) = 0 for all (s,t) € M. Hence, since M z 0 we have that MNS} z 0 
which contradicts (2). 
Next, we prove that M is a probabilistic bisimulation. Let (s,t) € M. Since 
M C v(I)N Di by (1), we have that (s,t) ¢ Dj and, hence, A(n(A))(s,t) = 
p(A)(s,t) < 1. From the definition of A, we can conclude that £(s) = £(t). 
Since 


m = w(A)(s, 1) 


= 5 ws lu, v) u(A)(u,v) [as above] 
(u,vjev(L)\Di 
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and p(A)(u,v) > m for all (u,v) € v(I) V Di, we can conclude that 
p(A)(u,v) = m for all (u,v) € support(ws4). Hence, support(u,4) C M. 
Therefore, M is a probabilistic bisimulation. 


Theorem 2. Dı = v(I). 


Proof. Immediate consequence of Proposition 3 and 4. 


We have shown that D; can be characterized as the greatest fixed point of I’. 
Next, we will show that D; can be decided in polynomial time. 


Theorem 3. Distance one can be decided in O(n? + m?). 


Proof. As we will show in Theorem 5, distance smaller than one can be decided 
in O(n? + m7). Hence, distance one can be decided in O(n? + m?) as well. 


4 Distance Smaller Than One 


To compute the set of state pairs which have distance one, we can first compute 
the set of state pairs which have distance less than one. The latter set we denote 
by De,. We can then obtain D, by taking the complement of Dz,. As we will 
discuss below, Dz; can be characterized as the least fixed point of the following 
function. 


Definition 6. The function 'Y : 25* _, 28 js defined by 
UX) = S \ TTA X). 
The next theorem follows from Theorem 2. 


Theorem 4. De; = p(‘1). 


Next, we show that the computation of Dz, can be formulated as a reach- 
ability problem on a directed graph which is induced by the labelled Markov 
chain. Thus, we can use standard search algorithms, for example, breadth-first 
search, on the induced graph. 

Next, we present the graph induced by the labelled Markov chain. 


Definition 7. The directed graph G = (V, E) is defined by 


V = $ U S? 
E = į ((u, v), (s,t)) | T(s)(u) > OA v(t)(v) > 0} 


We are left to show that in the graph G defined above, a vertex (s, t) is 
reachable from some vertex in S2 if and only if the state pair (s, t) in the labelled 
Markov chain has distance less than one. 

As we have discussed earlier, if a state pair (s,t) has distance one, either s 
and t have different labels, or for all couplings w € §2(7(s),7(t)) we have that 
support(w) C Di. To avoid the universal quantification over couplings, we will 
use Proposition 1 in the proof of following proposition. 
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Proposition 5. u('Y) = { (s,t) | (s,t) is reachable from some (u,v) € Sg }. 
Theorem 5. Distance smaller than one can be decided in O(n? + m?). 
Proof. Distance smaller than one can be decided as follows. 


1. Decide distance zero. 
2. Breadth-first search of G, with the queue initially containing the pairs of 
states that have distance zero. 


By Theorem 4 and Proposition 5, we have that s and t have distance smaller 
than one if and only if (s,t) is reachable in the directed graph G from some 
(u,v) such that u and v have distance zero. These reachable state pairs can be 
computed using breadth-first search, with the queue initially containing S$}. 

Distance zero, that is, probabilistic bisimilarity, can be decided in O(m log n) 
as shown by Derisavi et al. in [10]. The directed graph G has n? vertices and m? 
edges. Hence, breadth-first search takes O(n? + m?). 


5 Number of Non-trivial Distances 


As we have already discussed earlier, distance zero captures that states behave 
exactly the same, that is, they are probabilistic bisimilar, and distance one indi- 
cates that states behave very differently. The remaining distances, that is, those 
greater than zero and smaller than one, we call non-trivial. Being able to deter- 
mine quickly the number of non-trivial distances of a labelled Markov chain 
allows us to decide whether computing all these non-trivial distances (using 
some policy iteration algorithm) is feasible. 

To determine the number of non-trivial distances of a labelled Markov chain, 
we use the following algorithm. 


1. Decide distance zero. 
2. Decide distance one. 


As first proved by Baier [4], distance zero, that is, probabilistic bisimilarity, 
can be decided in polynomial time. As we proved in Theorem3, distance one 
can be decided in polynomial time as well. Hence, we can compute the number 
of non-trivial distances in polynomial time. 

'To decide distance zero, we implemented the algorithm to decide probabilistic 
bisimilarity due to Derisavi et al. [10] in Java. We also implemented our algorithm 
to decide distance one, described in the proof of Theorems 3 and 5. 

We applied our implementation to labelled Markov chains that model ran- 
domized algorithms and probabilistic protocols. These labelled Markov chains 
have been obtained from the verification tool PRISM [20]. We compute the num- 
ber of non-trivial distances for two models: the randomized self-stabilising algo- 
rithm due to Herman [14] and the bounded retransmission protocol by Helmink 
et al. [13]. 
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For the randomized self-stabilising algorithm, the size of the labelled Markov 
chain grows exponentially in the numbers of processes, N. The results for the 
randomized self-stabilising algorithm are shown in the table below. As we can 
see from the table, for systems up to 128 states, the algorithm runs for less than 
a second. For the system with 512 states, the algorithm terminates within seven 
minutes. For the case N = 3, there are only 12 non-trivial distances. The size 
is so small that we can easily compute all the non-trivial distances. Section 6 
will use the simple policy iteration algorithm as the next step to compute them. 
The same applies to the case N = 5. For N = 7 or 9, the number of non-trivial 
distances is around 11,000 and 200,000, respectively. This makes computing all 
of them infeasible. Thus, instead of computing all of them, we need to find 
alternative ways to handle systems with a large number of non-trivial distances. 
We will discuss two alternative ways in Sects. 7 and 8. Moreover, in this example, 
as |D,| = |S?|, we know that all the state pairs with distance one are those that 
have different labels. 


N||S| Do + Di|Non-trivial| |Do| Di |S? | 
3. 8 L00ms 12 38 14 14 
5 32 6.06ms 280 304 440 440 
7 128 | 0.77s | 11,032 2,160 | 3,192 | 3,192 
9 512 378.42s | 230,712 13,648 | 17,784 | 17,784 


In the bounded retransmission protocol, there are two parameters: N denotes 
the number of chunks and M the maximum allowed number of retransmissions 
of each chunk. The results are shown in the table below. The algorithm can 
handle systems up to 3,526 states within 11 min. In this example, there are no 
non-trivial distances. As a consequence, deciding distance zero and one suffices 
to compute all the distances in this case. 


NIM S | Do+ Dı |Do| |D] |S? 
16 | 2 677 3.0s 456,977 | 1,352 1,352 
16,3 886 8.6s 183,226 | 1,770 | 1,770 
16/4 | 1,095 | 17.5s 1,196,837 |2,188 | 2,188 
16/5 | 1,304 | 22.8s 1,697,810 | 2,606 | 2,606 
32|2 | 1,349 | 24.7s 1,817,105 | 2,696 | 2,696 
32|3 | 1,766 | 69.7s 3,115,226 | 3,530 | 3,530 
32|4 | 2,183 |141.0s 4,761,125 | 4,364 | 4,364 
32|5 | 2,600 | 208.6s 6,754,802 | 5,198 | 5,198 
64|2 | 2,693 | 235.28 1,246,865 | 5,384 | 5,384 
64|3 | 3,526 |616.4s  |12,425,626 | 7,050 | 7,050 
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6 All Distances 


To compute all distances of a labelled Markov chain, we augment the existing 
state of the art algorithm, which is based on algorithms due to Derisavi et al. 
[10] (step 1) and Bacci et al. [2] (step 3), by incorporating our decision procedure 
(step 2) as follows. 


1. Decide distance zero. 
2. Decide distance one. 
3. Simple policy iteration. 


Given that we not only decide distance zero, but also distance one, before 
running simple policy iteration, the correctness of the simple policy iteration 
algorithm in the augmented setting needs an adjusted proof. 

As we already discussed in the previous section, step 1 and 2 are polynomial 
time. However, step 3 may take at least exponential time in the worst case, as 
we have shown in [27]. Hence, the overall algorithm is exponential time. 

The first example we consider here is the synchronous leader election protocol 
of Itai and Rodeh [15] which is taken from PRISM. The protocol takes the 
number of processors, N, and a constant K as parameters. We compare the 
running time of our new algorithm with the state of the art algorithm, that 
combines algorithms due to Derisavi et al. and due to Bacci et al. The results 
are shown in the table below. In this protocol, the number of non-trivial distances 
is zero. Thus, our new algorithm terminates without running step 3 which is the 
simple policy iteration algorithm. On the other hand, the original simple policy 
iteration algorithm computes the distances of all the elements in the set D, V $?, 
the size of which is huge as can be seen from the last two columns of the table. 


N K| |S| |Do--SPI| Do + Di -- SPI |Speed-up |Do| |D1] |82| 

32 26 4s 1 ms 4,281 122 554 50 
34 147 49h 13 ms 13,800,000 7,419 14,190 292 
3.16 459 - 214ms - 88,671 122,010 916 
3.8 | 1,059 - 3s - 508,851 612,680 | 2,116 
4 |2 61 812s 3 ms 305,000 459 3, 262 120 
4 4 | 812 - 388 ms - 145,780 513,564 | 1,622 
4 |6 | 3,962 - 82s - 4,350,292 | 11,347,152 7,922 
4 8 |12,400 - 2,971s - 46,198,188 |107,561,812 | 24,798 
52 141 - 6 ms - 2,399 17,482 280 
5 4 | 4,244 - 33s - 3,318,662 | 14,692,874 | 8,486 
6.2 335 - 25 ms - 14,327 97,898 668 


The simple policy iteration algorithm can only handle a limited number of 
states. For the labelled Markov chain with 26 states (N = 3 and K = 2) the 
simple policy iteration algorithm takes four seconds, while our new algorithm 


Deciding Probabilistic Bisimilarity Distance One for Labelled Markov Chains 693 


takes one millisecond. The speed-up is more than 4,000 times. For the labelled 
Markov chain with 61 states (NV = 4 and K = 2), the simple policy iteration 
algorithm runs in 812s, while our new algorithm takes three milliseconds. The 
speed-up of the new algorithm is 30,000 times. The biggest system the simple 
policy iteration algorithm can handle is the one with 147 states (N = 3 and K = 
4) and it takes more than 49h. In contrast, our new algorithm terminates within 
13 ms. That makes the new algorithm seven orders of magnitude faster than the 
state of the art algorithm. This example also shows that the new algorithm can 
handle systems with at least 12,400 states. 

In the second example, we model two dies, one using a fair coin and the other 
one using a biased coin. The goal is to compute the probabilistic bisimilarity 
distance between these two dies. An implementation of the die algorithm is part 
of PRISM. The resulting labelled Markov chain has 20 states. 

As there are only 30 non-trivial distances, we run the simple policy iteration 
algorithm as step 3. The new algorithm is about 46 times faster than the original 
algorithm. 


[S| | Do+SPI | Do + Dı + SPI. Speed-up | Non-trivial | |Do| | |D1| | |S2] 
20 | 5.558 0.12s 46.25 30 20 | 350 | 198 


7 Small Distances 


As we have discussed in Sect. 5, for systems of which the number of non-trivial 
distances is so large that computing all of them is infeasible, we have to find 
alternative ways. In practice, as we only identify the state pairs with small dis- 
tances, we can cut down the number of non-trivial distances by only computing 
those with small distances. 

To compute the non-trivial distances smaller than a positive number, €, we 
use the following algorithm. 


1. Decide distance zero. 
2. Decide distance one. 
3. Compute the query set 


Q = {(s,t) € S? \ (Do U D1) | A(d)(s,t) <€} 


where 


_ fiif(st) € D 
d(s,t) = { 0 otherwise 


4. Simple partial policy iteration for Q. 


The first two steps remain the same. In step 3, we compute a query set Q 
that contains all state pairs with distances no greater than e, as shown in Propo- 
sition 6. In step 4, we use this set as the query set to run the simple partial policy 
iteration algorithm by Bacci et al. [2]. 
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Proposition 6. Let d be the distance function defined in step 3. For all (s,t) € 
S?\ (Do U D1), if n(A)(s,t) € e, then A(d)(s,t) x e. 


Given that we not only decide distance zero, but also distance one, before 
running simple partial policy iteration, the correctness of the simple partial 
policy iteration algorithm in the augmented setting needs an adjusted proof. 

As we have seen before, step 1 and 2 take polynomial time. In step 3, com- 
puting A(d) corresponds to solving a minimum cost network flow problem. Such 
a problem can be solved in polynomial time using, for example, Orlin's network 
simplex algorithm [24]. As we have shown in [28], step 4 takes at least expo- 
nential time in the worst case. Therefore, the overall algorithm is exponential 
time. 

We consider the randomized quicksort algorithm, an implementation of which 
is part of jpf-probabilistic [31]. The input of the algorithm is the list to be sorted. 
'The list of size 6 gives rise to a labelled Markov chain with 82 states. We compare 
the running time of the new algorithm for small distances (Dp + Dı +Q + SPPI) 
to the original algorithm (Do + SPI) and the new algorithm presented in Sect. 6 
(Do + Di + SPI). The original algorithm (Dp + SPI) takes about 14h, the new 
algorithm which incorporates the decision procedure of distance one takes less 
than 7h. For € = 0.1, the new algorithm for small distances takes 57 min. This 
makes it about 7 times faster than the algorithm presented in Sect. 6 and about 
15 times faster than the original simple policy iteration algorithm. For € = 0.01, 
the new algorithm for small distances takes even less time, namely 41min. As 
can be seen in the table below, the total number of non-trivial distances is 2,300. 
'The simple partial policy iteration algorithm starts with the query set Q but 
may have to compute the distances of other state pairs as well. The total number 
of state pairs considered by the simple partial policy iteration algorithm can be 
found in the column labelled Total. 


€ Do + Di + Q + SPPI | |Q| | Total | Non-trivial 
0.1 | 57min 96 |1,002 | 2,300 
0.01 | 41 min 84 842 | 2,300 


8 Approximation Algorithm 


We propose another solution to deal with a large number of non-trivial distances 
by approximating the distances rather than computing the exact values. To 
approximate the distances such that the approximate values differ from the exact 
ones by at most a, a positive number, we use the following algorithm. 


1. Decide distance zero. 
2. Decide distance one. 
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—fiif(st)e Di 
A a b otherwise 
0 if (s,t) € Do 
E { 1 se 
repeat 
for each (s,t) € S? (Do U Di) 
if l(s,t) Z u(s,t) 
l(s,t) = A(I)(s,t) 
u(s,t) = A(u)(s, t) 


until |l- u| <a 


Again, the first two steps remain the same. Step 3 contains the new approx- 
imation algorithm called distance iteration (DI). In this step, we define two 
distance functions, a lower-bound / and an upper-bound u. We repeatedly apply 
A to these two functions until the difference of the non-trivial distances in these 
two functions is smaller than the threshold a. For each state pair we end up 
with an interval of at most size o in which their distance lies. To prove the algo- 
rithm correct, we modify the function A defining the probabilistic bisimilarity 
distances slightly as follows. 


Definition 8. The function Ao : [0, ys” — [0, ys is defined by 


|. f0 if (s.t) € D 
A(d)(s, t) = TS otherwise ] 


Some properties of Ag, which are key to the correctness proof of the above 
algorithm, are collected in the following theorem. 


'Theorem 6. 


(a) The function Ao is monotone. 

(b) The function Ao is nonexpansive. 

(c) u(Ao) = (A). 

(d) (Ao) = v( Ao). 

(e) (âo) = sup men AG’ (do), where do(s,t) = 0 for all s,t € S. 
(f) v(Ao) = infnen AG (di), where di(s,t) = 1 for all s,t € S. 


Let us use randomized quicksort introduced in Sect. 7 and the randomized 
self-stabilising algorithm due to Herman [14] introduced in Sect. 5 as examples. 
Recall that for the randomized self-stabilising algorithm, when N — 7, the num- 
ber of non-trivial distances is 11,032, which we are not able to handle using the 
simple policy iteration algorithm. We apply the approximation algorithm to this 
model and the randomized quicksort example with 82 states and present the 
results below. The accuracy a is set to be 0.01. 

The approximation algorithm for randomized quicksort runs for about 
14 min, which is about 3 to 4 times faster than the algorithm for small distances in 
Sect. 7. For the randomized self-stabilising algorithm with 128 states, the approx- 
imation algorithm terminates in about 54h. Although the number of non-trivial 
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distances for the randomized self-stabilising algorithm is about 5 times of that of 
the randomized quicksort, the running time is more than 200 times slower. It is 
unknown whether this approximation algorithm has exponential running time. 


Model [S| Non-trivial | Do + Di+DI 
Randomized quicksort 82, 2,300 14min 
Randomized self-stabilising algorithm | 128 | 11,032 54h 


9 Conclusion 


In this paper, we have presented a decision procedure for probabilistic bisim- 
ilarity distance one. This decision procedure provides the basis for three new 
algorithms to compute and approximate the probabilistic bisimilarity distances 
of a labelled Markov chain. The first algorithm decides distance zero, then decides 
distance one, and finally uses simple policy iteration to compute the remaining 
distances. As shown experimentally, this new algorithm significantly improves 
the state of the art algorithm that only decides distance zero and then uses sim- 
ple policy iteration. The second algorithm computes all probabilistic bisimilarity 
distances that are smaller than some given upper bound, by deciding distance 
zero, deciding distance one, computing a query set, and running simple partial 
policy iteration for that query set. This second algorithm can handle labelled 
Markov chains that have considerably more non-trivial distances than our first 
algorithm. The third algorithm approximates the probabilistic bisimilarity dis- 
tances up to a given accuracy, deciding distance zero, deciding distance one and 
running distance iteration. Also this third algorithm can handle labelled Markov 
chains that have considerably more non-trivial distances than our first algorithm. 
Whereas we know that the first two algorithms take at least exponential time 
in the worst case, the analysis of the running time of the third algorithm has 
not yet been determined. Moreover, if we are only interested in the probabilistic 
bisimilarity distances for a few state pairs, with pre-computation of distance zero 
and one we can exclude the state pairs with trivial distances. We can add the 
remaining state pairs to a query set and run simple partial policy iteration to 
get the distances. Alternatively, we can modify the distance iteration algorithm 
to approximate the distances for the predefined state pairs. The details of these 
new algorithms will be studied in the future. 
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